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(57) Abstract 

Disclosed herein arc techniques for directly identifying candidate compounds as agonists, partial agonists and/or. most preferably, 
inverse agonists, to endogenous, consiitutivcly activated orphan G protein-coupled receptors. Such directly identified compounds can be 
utilized, most preferably, in pharmaceutical compositions. 
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ENDOGENOUS CONSTITUTIVELY ACTIVATED 
G PROTEIN-COUPLED ORPHAN RECEPTORS 

The benefit of commonly owned: (1) Provisional Patent Application Serial Number 
60/094879, filed July 3 1 , 1 998; (2) Provisional Patent Application Serial Number 60/1 06300, 
filed October 30, 1998; (3) Provisional Patent Application Serial Number 60/11 0,906, filed 
December 4, 1 998, and (4) Provisional Application Serial Nxmiber 60/1 2 1 ,85 1 , filed February 
26, 1999 is hereby claimed. This patent document is related to U.S. Serial Nimiber 
09/060,188, filed April 14, 1998. The entire disclosures of each of the foregoing patent 
applications are incorporated herein by reference. 

FIELD OF THE INVENTION 

The invention disclosed in this patent document relates to transmembrane receptors, 
more particularly to endogenous, constituti vely active G protein-coupled receptors for which 
the endogenous ligand is imknown, and most particularly to the use of such receptors for the 
direct identification of candidate compounds via screening as agonists, partial agonists or 
inverse agonists to such receptors. 
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BACKGROUND OF THE INVENTION 

A. G protein-coupled receptors 

G protein-coupled recqjtors share a common structural motif. All these receptors have 
seven sequences of between 22 to 24 hydrophobic amino acids that form seven alpha helices, each 
of which spans the membrane. The transmembrane helices are joined by strands of amino acids 
having a larger loop between the fourth and fifth transmembrane helix on the extracellular side of 
the membrane. Another larger loop, composed primarily of hydrophilic anfiino acids, joins 
transmembrane helices fiVe and six on the intracellular side of the membrane. The carboxy 
terminus of the receptor lies intracellularly with the amino terminus in the extracellular space. It 
is thought that the loop joining helices five and six, as well as the carboxy terminus, interact with 
the G protein. Currently, Gq, Gs, Gi, and Go are G proteins that have been identified. The general 
structure of G protein-coupled receptors is shown in Figure 1 . 

Under physiological conditions, G protein-coupled receptors exist in the cell membrane 
in equilibrium between two different states or conformations: an "inactive" state and an "active" 
state. As shown schanatically in Figure 2, areceptor in an inactive state is unable to link to the 
intracellular transduction pathway to produce a biological response. Changing the receptor 
conformation to the active state allows linkage to the transduction pathway and produces a 
biological response. 

A receptor may be stabilized in an active state by an endogenous ligand or an exogenous 
agonist ligand. Recent discoveries such as, including but not exclusively limited to, modifications 
to the amino acid sequence of the receptor provide means other than ligands to stabilize the active 
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state conformation. These means effectively stabilize the receptor in an active state by simulating 
the effect of a ligand binding to the receptor. Stabilization by such ligand-independent means is 
termed "constitutive receptor activation. " A receptor for v/hich the endogenous ligand is unknown 
or not identified is referred to as an "orphan receptor." 
B« Traditional Compound Screening 

Generally, the use of an orphan receptor for screening purposes to identify compounds that 
modxilate a biological response associated with such receptor has not been possible. This is 
because the traditional "dogma" regarding screening of compounds mandates that the ligand for 
the receptor be known, wiiereby compounds that competitively bind with the receptor, Le., by 
interfering or blocking the binding of the natural ligand with the receptor, are selected. By 
definition, then, this approach has no applicability with respect to orphan receptors. Thus, by 
adhering to this dogmatic approach to the discovery of therapeutics, the art, in essence, has taught 
and has been taught to forsake the use of orphan receptors unless and until the natural ligand for 
the receptor is discovered. The pursuit of an endogenous ligand for an orphan receptor can take 
several years and cost millions of dollars. 

Furthermore, and given that there are an estinMted 2,000 G protein-coupled receptors in 

the human genome, flie majority of which being orphan receptors, the traditional dogma castigates 

a creative approach to the discovery of therapeutics to these receptors. 

C. Exemplary Orphan Receptors: GPR3, GPR4, GPR6, GPR12, GPR21, 
GHSR, OGRl and AL022171 

GPR3 is a 330 amino acid G protein coupled receptor for which the endogenotis ligand 

is unknown. (Marchese, A. et al. (1994) Genomics 23:609; see also, lismaa, T.P. et al (1994) 

Genomics 24:391; see Figure 1 for reported nucleic acid and amino acid sequence.) GPR3 is 

constitutively active in its endogenous form. (Eggerick, D. et al (1995) Biochenu J. 389:837). 
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GPR12 is a 334 amino acid homolog of GPR3; the endogenous ligand for GPR12 is unknown 
(Song, Z. -H., et al (1995) Genomics, 28:347; see Figure 1 for reported amino acid sequence). 
GPR6 is a 362 amino acid homolog of GPR3 ; the endogenous ligand for GPR6 is vinknown (Song, 
Z.-H. et al, supra.\ see Figure 1 for reported amino acid sequence). GPR6 transcripts are reported 
to be abundant in the human putamen and to a lesser extent in the fiontal cortex, hippocampus, and 
hypothalamus (Heiber, M. et al. DNA and Cell Biology (1995) 14(1): 25; see Figure 1 for reported 
nucleic acid and amino acid sequences for GPR6). GPR4 has also been identified as an orphan 
GPCR (Heiber, M. et al, 14 DNA Cell Biol 25 (1995)). OGRl, an orphan GPCR, is reported to 
have a high level of homology with GPR4 (Xu, Y, and Casey, G., 35 Genomics 397 (1996)). 
GPR21 is a 349 amino acid G protein coupled receptor for which the endogenous ligand is 
unknown {see GenBank Accession # U66580 for nucleic acid and deduced amino acid sequence). 
GPR21 has been reported to be located at chromosome 9q33, O'Dowd B. et al., 187 Gene 75 
(1997). AL022171 is a human DNA sequence firom clone 384F21 on chromosome lq24. 
AL022171 has been identified to contain an open reading frdioQ of 1,086 bp encoding for a 361 
amino acid protein, (see GenBank Accession number AL022 171). AL022 1 7 1 is 68% homologous 
to GPR21 {see Figure 5B). GHSR is also identified as an orphan GPCR (Howard, A,D. et al, 273 
Science 91 A {1996)). 



SUMMARY OF THE INVENTION 

Disclosed herein are methods for screening of candidate compounds against 
endogenous, constitutively activated G protein-coupled orphan receptors (GPCRs) for the 
direct identification of candidate compounds as agonists, inverse agonists or partial agonists 
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to such receptors. For such screening purposes, it is preferred that an endogenous, 
constitutively activated orphan GPCRrG protein - fusion protein be utilized. 

BRIEF DESCRIPTION OF THE DRAWINGS 

Figure 1 shows a generalized structure of a G protein-coupled receptor with the numbers 
assigned to the transmembrane helixes, the intracellular loops, and the extracellular loops. 

Figure 2 schematically shows the two states, active and inactive, for a typical G protein 
coupled receptor and the linkage of the active state to the second messenger transduction pathway. 

Figure 3 is computerized representation of a "dot-blot" showing the distribution of the 
orphan receptor GPR4 across a variety of human tissues {see Appendix A for grid-code). 

Figure 4 is a diagram showing enhanced binding of p^SJGTPyS to membranes prepared 
from 293T cells transfected with the orphan receptor GPR3 compared to those transfected with 
control vector alone at 75 |iig/well membrane protein. The radiolabeled concentration of 
P^S]GTPyS was held constant at 1 .2 nM and the GDP concentration was held constant at 1 nM. 
The assay was performed on 96-well format in Wallac scintistrips. 

Figure 5A shows the amino acid alignmmt of orphan recqjtors GPRS, GPR6, and 
GPR12. Figure SB shows the amino acid alignment of orphan receptors GPR21 and A1022171 
(Consensus #1 indicates matching residues). 

Figure 6A is a diagram showing that the orphan receptors GPR3, GPR6, and GPR 12 are 
confirmed to be constitutively active by their enhanced ability to induce expression of p- 
galactosidase from a CRE driven reporter system in VIP cells. Figure 6B and 6C are diagrams of 
orphan receptors GPR21 and AL022171, respectively, that have also been confirmed to be 
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constitutively active by their enhanced ability to induce expression of the luciferase gene from a 
CRE driven reporter system in both 293 and 293T cells. 

Figures 7A, 7B and 7C show the relative distribution of the expression of the GPRS (A), 
GPR6 (B), and GPR12 (C) orphan receptors across several normal human tissues as determined 
by RT-PCR. Abbreviations: Ocx = occipital cortex; Hypoth = hypothalamus; Tex = temporal 
cortex; Fcx - frontal cortex. 

Figures 8A and 8B show GPRS recqstor expression in normal (A) and epileptic (B) 
human brain tissue as examined by RT-PCR. 

Figure 9 A is a copy of an autoradiograph evidencing the results from in situ hybridization 
(normal rat) using GPR6 probe; Figure 9B is a reference map of the corresponding region of the 
rat brain* 

Figure lOA is a copy of an autoradiograph evidencing the results from in situ 
hybridi2ation (Zucker rat - lean) using GPR6 probe; Figure lOB is a copy of an autoradiogr^h 
evidencing the results from in situ hybridization (Zucker rat - obese) using GPR6 probe; Figure 
IOC is a reference map of the corresponding region of the rat brain. 

Figures llA-F are copies of autoradiographs evidencing flie results from in situ 
hybridization (normal rat) using GPR12 probe. 

F^ure 12 is a copy of an autoradiograph evidencing the results from in situ hybridization 
(normal rat) iising GPR6 probe (12A), and orexin 1 receptor probe (12B) with overlays for 
determination of co-localization of the two receptors (12C and 12D). 

Figure 13 is a copy of an autoradiograph evidencing the results from in situ hybridization 
(normal rat) using GPR6 probe (1 3 A), and melanocortin-3 receptor probe (1 3B) with overlays for 
determination of co-localization of the two receptors (13C and 13D). 
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Figure 14 provides results jfrom co-localization experiment, evidencing that GPR6 and 
AGRP are co-localized within the arcuate. The arrow directs attention to to a specific cell within 
the arcuate, with the circle surrounding the cell; the "dots" are radiolabeled GPR6, and beneath 
those, in a darker shade, is AGRP. 

Figure 15 provides graphic results of body weight over time jfrom animals (n = 5) 
receiving antisense oligonucleotides to GPR6 (star symbol at Day 5 indicates day on which 
animals received d-amphetamine sulfate injection; see Figure 16). 

Figure 16 provides bar graph results fiom baseline locomotor activity and fi*om 
amphetamine-induced locomotive behavior in the animals of Figure 15. 

Figure 1 7 provides bar-graph results from the direct identification of candidate compounds 
screened against GPR3 Fusion Protein (Figure 17A) and GPR6 Fusion Protein (Figure 17B). 

Figure 18A-L is a sequence diagram of the preferred vector pCMV, including 
restriction enzyme site locations. 

DETAILED DESCRIPTION 

The scientific literature that has evolved around receptors has adopted a number of terms 
to refer to ligands having various effects on receptors. For clarity and consistency, the following 
definitions will be used fliroughout this patent document To the extent that these definitions 
conflict with otiier definitions for these terms, the following definitions shall control: 

AGONISTS shall mean materials (e.^.» ligands, candidate compounds) that activate the 
intracellular response >^en they bind to the receptor, or enhance GTP binding to membranes. 

AMINO ACID ABBREVIATIONS used herein are set out in Table 1 : 
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PARTIAL AGONISTS shall mean materials {e.g.^ ligands, candidate compounds) 
vsiiich activate the intracellular response A^en they bind to the receptor to a lesser 
degree/extent than do agonists, or enhance GTP binding to membranes to a lesser degree/extent 
than do agonists 

ANTAGONIST shall mean materials (e.g., ligands, candidate compoxmds) that 
competitively bind to the receptor at the same site as the agonists but which do not activate the 
intracellular response initiated by the active form of the receptor, and can thereby inhibit the 
intracellular responses by agonists or partial agonists. ANTAGONISTS do not diminish the 
baseline intracellular response in the absence of an agonist or partial agonist. 
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CANDIDATE COMPOUND shall mean a molecule (for example, and not limitation, 
a chemical compound) ^^ilich is amenable to a screening technique. Preferably, the phrase 
"candidate compound" does not include compounds wtdch were publicly known to be 
compounds selected from the group consisting of inverse agonist, agonist or antagonist to a 
receptor, as previously detemiined by an indirect identification process ("indirectly identified 
compound"); more preferably, not including an indirectly identified compound v^ch has 
previously been determined to have therapeutic eflBcacy in at least one mammal; and, most 
preferably, not including an indirectly identified compound which has previously been 
determined to have therapeutic utility in humans. 

COMPOSITION means a material comprising at least one component; a 
"pharmaceutical composition" is an example of a composition, 

COMPOUND EFFICACY shall mean a measurement of the ability of a compoimd 
to inhibit or stimulate receptor functionality, as opposed to receptor binding afBnity. A most 
preferred means of detecting compound efficacy is via measurement of GTP (via p^S]GTPyS) 
or cAMP, as further disclosed in the Example section of this patent document. 

CONSnXUnVELY ACTIVATED RECEPTOR (Constitutively Active Receptor) 
shall mean a receptor subject to constitutive receptor activation. A constitutively activated 
receptor can be endogenous or non-endogenous. 

CONSTITUTIVE RECEPTOR ACTIVATION shaU mean stabilization of a 
receptor in the active state by means other than binding of the receptor with its endogenous 
ligand or a chemical equivalent thereof. 

CONTACT or CONTACTING shall mean bringing at least two moieties together, 
Mdiether in an in vitro system or an in vivo system. 
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DIRECTLY roENTIFYING or DIRECTLY IDEN l ll lED, in relationship to the 
phrase "candidate compound", shall mean the screening of a candidate compound against a 
constitutively activated receptor, preferably a constitutively activated orphan receptor, and most 
preferably against a constitutively activated G protein-coupled cell surface orphan receptor, and 
assessing the compound efficacy of such compound. This phrase is, under no circumstances, to 
be interpreted or understood to be encompassed by or to encompass the phrase "indirectly 
identifying" or "indirectly identified" 

ENDOGENOUS shall mean a material that a mammal naturally produces. 
ENDOGENOUS in reference to, for example and not limitation^ the term "receptor," shall 
mean that \^ilich is naturally produced by a mammal (for example, and not limitation, a 
himian) or a virus. By contrast, the term NON-ENDOGENOUS in this context shall mean 
that which is not naturally produced by a mammal (for example, and not limitation, a human) 
or a virus. For example, and not limitation, a receptor w^hich is not constitutively active in its 
endogenous form, bxit A^en manipulated becomes constitutively active, is most preferably 
referred to herein as a "non-endogenous, constitutively activated receptor." Both terms can be 
utilized to describe both "in vivo" and "in vitro" systems. For example, and not limitation, in a 
screening approach, the endogenous or non-endogmous receptor may be in ref^^ce to an in 
vitro screening system. As a further example and not limitation, A^ere the genome of a 
mammal has been manipulated to include a non-endogenous constitutively activated receptor, 
screening of a candidate compound by means of an in vivo system is viable. 

G PROTEIN COUPLED RECEPTOR FUSION PROTEIN and GPCR FUSION 
PROTEIN, in the context of the invention disclosed herein, each mean a non-endogenous 
protein comprising an endogenous, constitutively activated orphan GPCR fused to at least one 
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G protein, most preferably, the alpha (a) subunit of such G protein (this being the subunit that 
binds GTP), with the G protein preferably being of the same type as the G protein that 
naturally couples with endogenous orphan GPCR. For example, and not limitation, in an 
endogenous state, the G protein "Gsa" is the predominate G protein that couples with GPR6 
such that a GPCR Fusion Protein based upon GPR6 would be a non-endogenous protein 
comprising GPR6 fused to Gsa. The G protein can be fused directly to the c-terminus of the 
endogenous, constitutively active orphan GPCR, or there may be spacers between the two. 

INDIRECTLY IDENTIFYING or INDIRECTLY IDENTIFIED means the 
traditional approach to the drug discovery process involving identification of an endogenous 
ligand specific for an endogenoxis receptor, screening of candidate compounds against the 
receptor for determination of those ^^4lich interfere and/or compete with the ligand-receptor 
interaction, and assessing the efiBcacy of the compound for affecting at least one second 
messenger pathway associated with the activated receptor. 

INHIBIT or IP4HIBITING, in relationship to the term "response" shall mean that a 
response is decreased or prevented in the presence of a compound as opposed to in the absence 
of the compound. 

INVERSE AGONISTS shall mean materials (e.^., ligand, candidate compound) 
which bind to either the endogenous form of the recqjtor or to the constitutively activated form 
of the receptor, and which inhibit the baseline intracelliilar resp)onse initiated by the active form 
of the receptor below the normal base level of activity which is observed in the absence of 
agonists or partial agonists, or decrease GTP binding to membranes. Preferably, the baseline 
intracellular response is inhibited in the presence of the inverse agonist by at least 30%, more 
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preferably by at least 50%, and most preferably by at least 75%, as compared with the baseline 
response in the absence of the inverse agonist 

LIGAND shall mean an endogenous, naturally occurring molecule specific for an 
endogenous, naturally occurring receptor. 

ORPHAN RECEPTOR shall mean an endogenous receptor for which the 
endogenous ligand specific for that receptor has not been identified or is not known. 

PHARMACEUTICAL COMPOSITION shall mean a composition comprising at 
least one active ingredient, whereby the composition is amenable to investigation for a 
specified, efficacious outcome in a mammal (for example, and not limitation, a human). Those 
of ordinary skill in the art will understand and appreciate the techniques appropriate for 
determining \^4iether an active ingredient has a desired efficacious outcome is based upon the 
needs of the artisan. 

NON-ORPHAN RECEPTOR shall mean an endogenous naturally occurring 
molecule specific for an endogenous naturally occurring ligand w4ierein the binding of a ligand 
to a receptor activates an intracellular signaling patiiiway. 

STIMULATE or STIMULATING, in relationship to the term "response" shall mean 
that a response is increased in the presence of a compoxmd as opposed to in the absence of the 
compound. 

The order of the following sections is set forth for presentational efficiency and is not 
intended, nor should be construed, as a limitation on the disclosure or the claims to follow. 
A. Introduction 

The traditional study of receptors has always proceeded fix)m the a priori assumption 
(historically based) that the endogenous ligand must first be identified before discovery could 
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proceed to find antagonists and other molecules that could affect the receptor. Even in cases where 
an antagonist might have been known first, the search immediately extended to looking for the 
endogenous ligand. This mode of thinking has persisted in receptor research even after the 
discovery of constitutively activated receptors. What has not been heretofore recognized is that 
it is the active state of the receptor that is most useful for discovering agonists, partial agonists, and 
inverse agonists of the receptor. For those diseases A^Wch result fiom an overly active receptor, 
what is desired in a ther^utic drug is a compound which acts to diminish the active state of a 
receptor, not necessarily a drug \^1lich is an antagonist to the endogenous ligand. This is because 
a compound (dmg) which reduces the activity of the active receptor state need not bind at the same 
site as the endogenous ligand. TTius, as taught by a method of this invention, any search for 
therapeutic compoimds should start by scre^iing compounds against the ligand-independent active 
state. The search, then, is for an inverse agonist to the active state receptor. 

Screening candidate compounds against the endogenous, constitutively activated orphan 
receptors, for example, and not linwted to, the endogenous, constitutively active GPCRs set forth 
herein, GPR3, GPR4, GPR6, GPR12, GPR21 , GHSR, OGRl , RE2 and AL022171, allows for the 
direct identification of candidate compounds vAnch act at these orphan cell sur&ce receptors, 
without requiring any prior knowledge or use of the receptor's endogenous ligand. By 
determining areas within the body where such receptors are ^pressed and/or over-expressed, it 
is possible to determine related disease/disorder states which are associated with the expression 
and/or over-expression of these receptors; such an approach is disclosed in this patent document. 

B. Disease/Disorder Identification and/or Selection 
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As will be set forth in greater detail below, most preferably inverse agonists to 
endogenous, constitutively activated orphan receptors, e.g. , such as those set forth herein (GPR3, 
GPR4, GPR6, GPR12, GPR21, GHSR, OGRl, RE2 and AL022171) can be identified by the 
methodologies of this invention. Such inverse agonists are ideal candidates as lead compounds 
in drug discovery programs for treating diseases related to these receptors. Indeed, an antagonist 
to such a receptor (even if the ligand were known) may be ineffective given that the receptor is 
activated even in the absence of ligand-receptor binding. Because of the ability to directly identify 
inverse agonists to these receptors, thereby allowing for the development of pharmaceutical 
compositions, a search, for diseases and disorders associated with these receptors is possible. For 
example, scanning both diseased and normal tissue samples for the presence of these orphan 
receptors now becomes more than an academic exercise or one wWch might be pursued along tiie 
path of identifying an endogenous ligand. Tissue scans can be conducted across a broad range of 
healthy and diseased tissues. Such tissue scans provide a preferred first step in associating a 
specific receptor with a disease and/or a disorder. 

Preferably, the DNA sequence of the endogenous, constitutively activated GPCR is used 
to make a probe for RT-PCR identification of the expression of the receptor in tissue samples. Hie 
presence of a receptor in a diseased tissue, or the presence of the receptor at elevated 
concentrations in diseased tissue compared to normal tissue, can be utilized to identify a 
correlation with that disease. Receptors can equally well be localized to regions of organs by this 
technique. Based on the known fiinctions of the specific tissues to >\iiich the receptor is localized, 
the putative fimctional role of the receptor can be deduced. 
Q Homology Identification 
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The identification and association of an orphan receptor with diseases and/or disorders can 
be beneficially enhanced via identification of additional receptors having homology with the 
original orphan receptor. This approach was utilized in the identification of both GPR6 and 
GPRl 2, based upon their sequence homology with GPR3 , and in the identification of AL022 171, 
having sequence homology to GPR21. GPRS was previously identified as a constitutively 
activated orphan receptor {see Eggerick, supra). What was not known, prior to this invention, was 
that GPR6, GPRl 2, GPR21 and AL022171 are also constitutively active in their endogenous 
states. Using known computerized databases (e.g., dbEST), GPR6, GPRl 2, GPR21 and 
AL022 1 7 1 were identified. 

This highlights certain unique benefits of the invention disclosed herein: because the 
dogma in dmg screening relies upon knowledge and identification of a receptor's endogenous 
ligand, the art had no motivation to explore whether or not GPR3 homologs were constitutively 
active in their endogenous forms (other than for, at best, academic curiosity). However, with the 
power of the present invention to directiy identify inverse agonists to such receptors, coupled with 
the ability to locate the distribution of such receptors in tissue samples, the present invention 
dramatically transcends such idle curiosity and provides a means for alleviating diseases and 
disorders which impact tiie human conditiorL 
D. Screening of Candidate Compounds 

1. Generic GPCR screening assay techniques 

When a G protein receptor becomes constitutively active, it binds to a G protein (eg,, Gq, 
Gs, Gi, Go) and stimulates the binding of GTP to the G protein. The G protein then acts as a 
GTPase and slowly hydrolyzes the GTP to GDP, wiiereby the receptor, imder normal conditions, 
becomes deactivated. However, constitutively activated receptors continue to exchange GDP to 
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GTP. A non-hydrolyzable analog of GTP, p^S]GTPyS, can be used to monitor enhanced binding 
to membranes which express constitutively activated receptors. It is reported that p^S]GTPyS can 
be used to monitor G protein coupling to membranes in the absence and presence of ligand. An 
example of this monitoring, among other examples well-known and available to those in the art, 
was reported by Traynor and Nahorski in 1 995 . The preferred use of this assay system is for initial 
screening of candidate compounds because the system is generically applicable to all G protein- 
coupled receptors regardless of the particular G protein that interacts with the intracellular domain 
of the receptor. It is in the context of the use of a GTP assay system that a GPCR Fusion Protein 
is preferably utilized. 

B 2. Specific GPCR screening assay techniques 

Once candidate compovmds are identified using the "generic" G protein-coupled 
receptor assay (i.e. an assay to select compounds that are agonists, partial agonists, or inverse 
agonists), further screening to confirm that the compounds have interacted at the receptor site 
is preferred. For example, a compoimd identified by the "generic" assay may not bind to the 
receptor, but may instead merely "uncouple" the G protein from the intracellular domain. 
In the case of GPR3, GPR4, GPR6, GPR12, GPR21 , GHSR, CXjRI , RE2 and AL022171 , it has 
been determined that these receptors couple the G protein Gs. Gs stimulates the enzyme 
adenylyl cyclase (Gi, on the other hand, inhibits this enzyme), Adenylyl cyclase catalyzes 
the conversion of ATP to cAMP; thus, because these receptors are activated in their 
endogenous forms, increased levels of cAMP are associated therewith (on the other hand, 
endogenously activated receptors which couple the Gi protein are associated with decreased 
levels of cAMP). See, generally, "Indirect Mechanisms of Synaptic Transmission," Chpt. 8, 
From Neuron To Brain (3"* Ed.) Nichols, J.G. et al eds. Sinauer Associates, Inc. (1 992). Thus, 
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assays that detect cAMP can be utilized to determine if a candidate compound is an inverse 
agonist to the receptor (/. e,, such a compound which contacts the receptor would decrease the 
levels of cAMP relative to the uncontacted receptor). A variety of approaches known in the 
art for measuring cAMP can be utilized; a most preferred approach relies upon the use of anti- 
cAMP antibodies. Another type of assay that can be utilized is a whole cell second 
messenger reporter system assay. Promoters on genes drive the expression of the proteins that 
a particular gene encodes. Cyclic AMP drives gene expression by promoting the binding of a 
cAMP-responsive DN A binding protein or transcription factor (CREB) which then binds to the 
promoter at specific sites called cAMP response elements and drives the expression of the gene. 
Reporter systems can be constructed \^4iich have a promoter containing multiple cAMP response 
elements before the reporter gene, e.g., p-galactosidase or luciferase. Thus, an activated Gs 
receptor such as GPR3 causes the accumulation of cAMP which then activates the gene and 
expression of the reporter protein. The reporter protein such as p-galactosidase or luciferase can 
then be detected using standard biochemical assays (see, for example, Chen et al. 1 995). A cAMP 
assay is particularly preferred. 

The foregoing specific assay approach can, of course, be utilized to initially directly 
identify candidate compounds, rather than by using the generic assay approach. Such a 
selection is primarily a matter of choice of the artisan. With respect to GPR6, use of a 
modified, conunercially available cAMP assay was initially utilized for the direct 
identification of inverse agonists. 

C 3. GPCR Fusion Protein 

The use of an endogenous, constitutively activated orphan GPCR for use in screening of 
candidate compounds for the direct identification of inverse agonists, agonists and partial agonists 
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provides a unique challenge in that, by definition, the endogenous receptor is active even in the 
absence of an endogenous ligand bound thereto. Thus, in order to differentiate between, e.g., the 
endogenous receptor in the presence of a candidate compound and the endogenous receptor in the 
absence of that compound, with an aim of such a differentiation to allow for an understanding as 
to whether such compound may be an inverse agonist, agonist, partial agonist or have no affect 
on such a receptor, it is preferred that an approach be utilized that can enhance such differentiation. 
A preferred approach is the use of a GPCR Fusion Protein. 

Generally, once it is determined that an endogenous orphan GPCR is constitutively active, 
using the assay techniques set forth above (as well as others), it is possible to determine the 
predominant G protein that couples with the endogenous GPCR. Coupling of the G protein to the 
GPCR provides a signaling pathway that can be assessed. Because it is most preferred that 
screening take place by use of a mammalian expression system, such a system will be expected 
to have endogenous G protein therein. Tlius, by definition, in such a system, the endogenous, 
constitutively active orphan GPCR will continuously signal. In this regard, it is preferred that this 
signal be enhanced such that in the presence of, e.g.y an inva:se agonist to the receptor, it is more 
likely that one will be able to more readily differentiate, particularly in flie context of screening, 
between the receptor when it is or is not contacted with the inverse agonist. 

The GPCR Fusion Protein is intended to enhance the efficacy of G protein coupling with 
the endogenous GPCR. The GPCR Fusion Protein appears to be important for screening with an 
endogenous, constitutively activated GPCR because such an approach increases the signal that is 
most preferably utilized in such screening techniques. Facilitating a significant "signal to noise" 
ratio is important for the screening of candidate compounds as disclosed herein. 
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The construction of a construct useful for expression of a GPCR Fusion Protein is within 
the purview of those having ordinary skill in the art. Commercially available expression vectors 
and systems offer a variety of approaches that can fit the particular needs of an investigator. One 
important criterion for such a GPCR Fusion Protein constmct is that the endogenous GPCR 
sequence and the G protein sequence both be in-frame (preferably, the sequence for the 
endogenous GPCR is upstream of the G protein sequence) and that the "stop" codon of the GPCR 
must be deleted or replaced such that upon expression of the GPCR, the G protein can also be 
expressed. The GPCR can be linked directly to the G protein, or there can be spacer residues 
between the two (preferably no more than about 12, although this numb^ can be readily 
ascertained by one of ordinary skill in the art). We have evaluated both approaches, and in terms 
of measurement of the activity of the GPCR, the results are substantially the same; however, there 
is a preference (based upon convenience) of use of a spacer in that some restriction sites that are 
not used will, effectively, upon expression, become a spacer. Most preferably, the G protein that 
couples to the endogenous GPCR will have been identified prior to the creation of the GPCR 
Fusion Protein constmct. Because there are only a few G proteins that have been identified, it is 
preferred that a construct comprising the sequence of the G protein (/.e., a universal G protein 
constmct) be available for insertion of an endogenous GPCR sequence therein; this provides for 
eflBdency in tiie context of large-scale screening of a variety of different endogenous GPCRs 
having different sequmces. 
£. Medicinal Chemistry 

Generally, but not always, direct identification of candidate compounds is preferably 
conducted in conjunction with compounds generated via combinatorial chemistry techniques, 
vsdiereby thousands of compoxmds are randomly prepared for such analysis. Generally, the 
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results of such screening will be compounds having unique core structures; thereafter, these 
compounds are preferably subjected to additional chemical modification around a preferred 
core structure(s) to further enhance the medicinal properties thereof. In this way, inverse 
agonists, agonists and/or partial agonists that are directly identified can be beneficially 
improved upon prior to development of pharmaceutical compositions comprising such 
compounds. Generally, it is preferred that the binding affinity of a directly identified 
compound selected for further refinement into a pharmaceutical composition have a binding 
affinity for the receptor of less than lOOnM, although this is generally a preference selection 
based upon the particular needs of the artisan. Such techniques are known to those in the art 
and will not be addressed in detail in this patent document. 
F. Pharmaceutical Compositions 

Candidate compoxmds selected for further development can be formulated into 
pharmaceutical compositions using techniques well known to those in the art Suitable 
pharmaceutically-acceptable carriers are available to tiiose in the art; for example, see Remington' s 
Pharmaceutical Sciences, 16* Edition, 1980, Mack Publishing Co., (Oslo et al., eds.). 

EXAMPLES 

The following examples are presented for purposes of elucidation, and not limitation, 
of the present invention. While specific nucleic acid and amino acid sequences are disclosed 
herein, those of ordinary skill in the art are credited with the ability to make minor 
modifications to these sequences while achieving the same or substantially similar results 
reported below. It is intended that equivalent, endogenous, constitutively activated human 
orphan receptor sequences having eighty-five percent (85%) homology, more preferably 
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having ninety percent (90%) homology, and most preferably having grater than ninety-five 
percent (95%) homology to GPR3, GPR4, GPR6, GPR12, GPR21, GHSR, OGRl, RE2 and 
AL022171 fall within the scope of any claims appended hereto. 

Example 1 

Preparation of In Situ Probes 

In situ probes for GPRS, GPR6, and GPR12 were prepared. The following PGR 
protocol was utilized for all three probes: the reaction condition utilized was IX rTth DNA 
polymerase buffer II, 1 .5 mM Mg(OAc)2, 0.2 mM each of the 4 nucleotides, 0.228 |ag rat 
genomic DNA, 0.25 ^iM of each primer (see below) and 1 unit of rTth DNA polymerase 
(Perkin Ehner) in 50 \xl reaction volume. The cycle condition was 30 cycles of 94**C for 1 
min, 55 ®C for 1 min and 72 for 45 sec with a Perkin Elmer Cetus 2400 thermal cycler. 

1 . Rat GPRS in situ probe 

Because the full length cDNA sequence for rat GPRS is not data-base available, the 
DNA fragment for the in situ probe was obtained by PGR using a 3' degenerate 
oligonucleotide based on the published human and mouse GPR3 sequences in the middle of 
the transmembrane domain 3, and a 5' degenerate oligonucleotide near the beginning of the 
5' extracellular domain. The sequences of the oligonucleotides utilized were as follows: 
5'-GGAGGATCCATGGCCTGGTTCTCAGC-3' (SEQ.ID.NO.:!; 5' oligo) 
5'-CACAAGCTTAGRCCRTCC MG RCA RTTCCA-3' (SEQ.ID.NO.: 2; 3' oligo) 
where R=A or G, and M=A or C. 
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A 537 bp PGR fragment containing nucleotide 24 through to the middle of transmembrane 
3 was digested with Bam HI and Hind III and was subcloned into a Bam HI-Hind III site of 
pBluescript. 

2. Rat GPR 6 in situ probe 

The in situ probe DNA fragment of rat GPR6 was obtained by PGR based on the 
published rat GPR6 cDNA sequences. The sequences of the oligonucleotides utilized were as 
follows: 

5'-GGAGAAGGTTGTGGGGGGGATGAAGGCTAG-3' (SEQJD.NO.: 3; 5' oligo) 
5'-AGAGGATGGAGGTGGGTGGTAGGAAGAG-3' (SEQJD.NO.: 4; 3' oligo) 
A 608 bp PGR fragment containing nucleotide -10 through to the middle of transmembrane 
domain 4 was digested with Bam HI and Hind III and was subcloned into Bam HI-Hind III 
site of pBluescript. 

3. Rat GPR J 2 in situ probe 

The in situ probe DNA fragment of rat GPR12 was obtained by PGR based on the 
published rat GPR12 cDNA sequences. The sequences of the oligonucleotides utilized were 
as follows: 

5'-CTTAAGCTTAAAATGAAGGAAGAGGGGAAG-3' (SEQ.ID.NO.: 5; 5' oligo) 
5'-GGAGGATCCCCAGAGGATCACTAGCAT-3' (SEQ.ID.NO.: 6; 3' oligo) 

A 516 bp PGR fragment containing nucleotide -5 through to the middle of transmembrane 
domain 4 was digested with Bam HI and Hind III and subcloned into a Bam HI-Hind III site 
of pBluescript 

In situ probe sequences generated were as follows: 
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Rat GPR3 probe: 

GGAGGATCCATGGCCTGGTTCTCAGCCGGCTCAGGCAGTGTGAATGTGAGCAT 

AGACCCAGCAGAGGAACCTACAGGCCCAGCTACACTGCTGCCCTCTCCCAGGG 

CCTGGGATGTGGTGCTGTGCATCTCAGGCACCCTGGTGTCCTGCGAGAATGCT 

CTGGTGATGGCCATCATTGTGGGCACGCCTGCCTTCCGCGCCCCCATGTTCCTG 

CTGGTGGGCAGCTTGGCCGTAGCAGACCTGCTGGCAGGCCTGGGCCTGGTCCT 

GCACTTCGCTGCTGACTTCTGTATTGGCTCACCAGAGATGAGCTTGGTGCTGGT 

TGGCGTGCTAGCAACGGCCTTTACTGCCAGCATCGGCAGCCTGCTGGCCATCA 

CCGTTGACCGCTACCTTTCCCTGTACAACGCCCTCACCTACTACTCAGAGACAA 

CAGTAACTCGAACCTACGTGATGCTGGCCTTGGTGTGGGTGGGTGCCCTGGGC 

CTGGGGCTGGTTCCCGTGCTGGCCTGGAACTGCCGGGACGGTCTAAGCTT 

(SEQ.ID.NO.: 7) 

Rat GPR6 probe: 

AAGCTTCTGGCGGCGATGAACGCTAGCGCCGCCGCGCrCAACGAGTCCCAGGTGGTGGCAGTAGCG 

GCCGAGGGAGCGGCAGCTGCGGCTACAGCAGCAGGGACACCGGACACCAGCGAATGGGGACCTCCG 

GCAGCATCa3CGGCGCTGGGAGGCGGCGGAGGACCTAACX3GGTCACrGGAGCTGTCTTCGCAGCTG 

CCCCCAGGACCCTCAGGACTTCTGCITTCGGCAGTGAATCCCTGGGATGTGCTGCTGTGCGTGTCGGG 

GACTGTGATCGCAGGCGAAAATGCGCTGGTGGTGGCGCTCATCGCATCCACTCCCGCGCTGCGCACG 

CCCATGTTTGTGCTCGTGGGTAGTCTGGCCACTGCTGACCTGCTGGCGGGCTGTGGCCrCATCCTACA 

CTTCGTGTTCCAGTACGTGGTGCCCrCGGAGACTGTGAGarrGCTCATGGTGGGCTTCCTGGTGQCGT 

CCTT(XjCCGCCTCAGTCAGCAGCCTGCrCGCrATCACAGTGGACCGTTACCTGTCCCrrTACAACGCG 
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CTCACCTACTACTCGCGCCGGACCCTGTTGGGCGTGCACCTCTTGCTAGCAGCCACCTGGATCC 
(SEQ.ID.NO.; 8) 

Rat GPR12 probe: 

AAGCITAAAATGAACGAAGACCCGAAGGTCAATTTAAGCGGGCTGCCTCGGGACTGTATAGAAGCT 

GGTACTCCGGAGAACATCTCAGCCGCTGTCCCCTCCCAGGGCTCTGTTGTGGAGTCAGAACCCGAGC 

TCGTTGTCAACCCCTGGGACATTGTCTTGTGCAGCTCAGGAACCCTCATCTGCTGTGAAAATGCCG 

GTGGTCCITATCATCm'CCACAGCCCCAGCCTGCGAGCACCCATGTTCCTGCTGATAGGCAGCC^ 

TCTTGCAGACCTGCTGGCTGGTCTGGGACTCATCATCAATTTTGTTTTTGCCT 

AGCCACCAAGCTGGTCACAATTGGACTCATTGTCGCCTCTTTCTCTGCCTCTGTCTGC^ 

CTATCACTGTGGACCGCTACCTCTCGCTGTATTACGCCCTGACGTACCACTCCGAGAGGACCGTCACC 

TTTACCTATGTCATGCTAGTGATGCTCTGGGGATCC (SEQ.ID.NO.: 9) 

Example 2 
Receptor Expression 

1. cDNA and Vectors 

With respect to GPR3 and GPR6, expression vectors comprising cDNA were 
generously supplied by Brian O'Dowd (University of Toronto). The vector for GPR3 cDNA 
was pcDN A3 ; the vector for GPR6 was pRcCMV (the coding region for GPR6 was subcloned 
into pCMV vector at a Hind Ill-Xbal site). GPR12 cDNA was prepared using the foUowmg 
protocol : Human GPRl 2 cDN A was obtained by PGR using human genomic DNA and a 5 ' 
primer from the 5' untranslated region with a Hind III restriction site, and a 3' primer from 
the 3' imtranslated region containing a Bam HI site'. Primers had the following sequences: 
5'-CTTAAGCTTGTGGCATTTGGTACT-3' (SEQ.ID.NO.: 10; 5' oligo) 
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5'-TCTGGATCCTTGGCCAGGCAGTGGAAGT-3 (SEQ ID.NO.: 1 1; 3' oligo) 
PGR was performed using rTth polymerase (Perkin Elmer) with the buffer system provided 
by the manufacturers, 0.25 |4.M of each primer, 0.2 jiM of each of the four nucleotides and 0.2 
p,g of genomic DNA as template . The cycle condition was 30 cycles of 94**C for 1 min, 57 
°C for 1 min and 72 °C for 1 .5 min. The 1 ,2 kb PGR fragment was digested with Hind III and 
Bam HI, and subcloned into Hind III -Bam HI site of pCMV expression vector. The resulting 
cDNA clones were fully sequenced and consistent with published sequences. 

With respect to GPR2 1 , PGR was performed using genomic DNA as template and rTth 
polymerase (Perkin Elmer) with the buffer system provided by the manufacturer, 0.25 |j.M of 
each primer, and 0.2 mM of each of the foiu^ nucleotides. The cycle condition was 30 cycles 
of 94**C for 1 min, 62°G for Imin and 72 ^G for 1 min and 20 sec. The 5' PGR primer was 
kinased with the sequence: 

5 '-GAGAATTG AGTGCTGAGGTG AAGATGAAGT-3 ' (SEQ.ID.NO.: 12) 

and the 3 ' primer contained a BamHI site with the sequence: 

5 '.GGGGATCCGGGTAAGTGAGCGAGTTGAGAT-3 ' (SEQ.ID.NO.: 1 3). 

The resulting 1 . 1 kb PGR fragment was^digested with BamHI and cloned into EcoRV-BamHI 

site of pGMV expression vector. Nucleic acid (SEQ.ID.NO.: 14) and amino acid 

(SEQJD.no.: 15) sequences for human GPR21 were thereafter determined. 

With respect to AL022 171, PGR was performed using genomic DNA as template and 
rTth polymerase (Perkin Elmer) with the biiffer system provided by the manufacturer, 0.25 
|iM of each primer, and 0.2 mM of each of the four nucleotides. The cycle condition was 30 
cycles of 94°G for 1 min, 54''G for Imin and 72 ^G for 1 min and 20 sec. The 5' primer 
contains an Hindlll site with the following sequence: 



SUBSnTUTE SHEET (RULE26) 



wo 00/06597 PCT/US99/17425 

-26- 

5 '-AGGAAGCTTTAAATTTCCAAGCCATGAATG-3 ' (SEQ.ID.NO.: 1 6) 
and the 3' primer contained a EcoRI site with the following sequence: 
5'-ACCGAATTCAGATTACATTTGATTTACTATG-3'(SEQ.ID.NO.:17). The resulting 1.15 
kb PGR fragment was digested with Hindlll and EcoRI and cloned into Hindlll-EcoRI site 
of pCMV expression vector. Nucleic acid (SEQ.ID.NO. : 1 8) and amino acid (SEQ.ID.NO. : 1 9) 
sequences for human AL022171 were thereafter determined and verified. 

With respect to GPR4 (GenBank accession number L36148), expression vectors 
comprising the cDNA was generously supplied by Brian O'Dowd (University of Toronto). 
The vector for GPR4 cDNA was pcDNA3 and this subcloned into pCMV vector at a Hind III- 
Xbal site (the 5' untranslated region between Hindlll and an Apal site was trimmed by 
conducting digestion/self ligation). 

With respect to RE2 (GenBank accession number AF091 890), PGR was performed 
using human brain cDNA as template and rTth polymerase (Perkin Elmer) with the buffer 
system provided by the manufacturer, 0.25 ^iM of each primer, and 0.2 mM of each of the four 
nucleotides. The cycle condition was 30 cycles of 94**C for 1 min, 62°C for Imin and 72 °C 
for 1 min and 30 sec. The 5' PGR primer contained an EcoRI site with the sequence 
5' .AGCGAATTCTGCCCACCCCACGCCGAGGTGCT-3' (SEQ. ID. No. 20) 
and the 3' primer contained a BamHI site with the sequence 

5'-TGCGGATCCGCCAGCTCTTGAGCCTGCACA-3'(SEQ.lD.NO.:2l). The L36kbPCR 
firagment that resulted after two rounds of PGR was then digested with EcoRI and BamHI and 
cloned into EcoRI-BamHI site of pCMV. Nucleic acid (SEQ. ID. NO. 22) and amino acid 
sequence (SEQ. ID. NO. 23) was thereafter determined. 
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With respect to OGRl (GenBank accession numter U48405), PGR was performed 
using hiiman genomic DNA as template and rTth polymerase (Perkin Elmer) with the buffer 
system provided by the manufacturer, 0.25 jiM of each primer, and 0.2 mM of each of the four 
nucleotides. The cycle condition was 30 cycles of 94°C for 1 min, 62°C for Imin and 72 °C 
for 1 min and 20 sec. The 5' PGR primer contained a Hindlll site with the sequence 
5'-GGAAGGTTGAGGGGGAAAGATGGGGAAGAT-3' (SEQ, ID. No. 24) 
and the 3' primer contain a BamHI site with the sequence 

5'-GTGGATGGACGGGGGGAGGAGGGAGGGTAG-3'(SEQ.lD.N0.25). The resulting 1.14 
kb PGR fragment was digested with Hindlll and BamHI and cloned into Hindlll-BamHI site 
pGMV. Nucleic acid (SEQ. ID. NO. 26) and amino acid sequence (SEQ. ID. NO. 27) was 
thereafter determined. 

With respect to GHSR, PGR was performed using hippocampus cDNA as template 
and TaqPlus Precision polymerase (Stratagene) with the buffer system provided by the 
manufacturer, 0.25 ^iM of each primer, and 0.2 mM of each 4 nucleotides. The cycle 
condition was 30 cycles of 94^G for 1 mm, 68^C for Imin and 72**C for 1 min and 1 0 sec. For 
first round PGR, the 5' PGR primer sequence: 
5'-ATGTGGAAGGGGAGGGGGAGGG-3' (SEQ.ID.NO.40) 
and the 3' primer sequence: 

5'-TGATGTATTAATAGTAGATTGT-3' (SEQ.ID.N0.41). 

Two microliters of the first round PGR was used as a template for the second roimd PGR 
where the 5' primer was kinased with sequence: 

5'-TAGGATGTGGAAGGGGAGGGGGAGGGAAGAGGGGGGGT-3' (SEQ.ID.NO.:42) and 
the 3' primer contains an EcoRI site with the sequence: 
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5'-CGGAATTCATGTATTAATACTAGATTCTGTCCAGGCCCG-3' (SEQ.lD,NO.:43). The 
1 .1 kb PGR fragment was digested with EcoRI and cloned into blunt-EcoRI site of CMVp 
expression vector. Nucleic acid (SEQ.ID.NO.:44) and amino acid (SEQ.ID.NO.:45) 
sequences for human GHSR were thereafter detemiined. 
2. Transfection procedure 

On day one, 1X10^ 293 or 293T cells per 1 50mm plate were plated out. On day two, two 
reaction tubes were prepared (the proportions to follow for each tube are per plate): tube A was 
prepared by mixing between 8-20^g DNA (e.g., pCMV vector; pCMV vector with receptor 
cDN A; pCMV with GPCR Fusion Protein, supra) in 1 -2ml serum free DMEM (frvine Scientific, 
Irvine, C A); tube B was prepared by mixing 50-1 20|il lipofectamine (Gibco BRL) in 1 -2ml serum 
fiw DMEM. Tubes A and B were then admixed by inversions (several times), followed by 
incubation at room temperature for 30-45min. The admixture is referred to as the "transfection 
mixture". Plated cells were washed with IXPBS, followed by addition of 10- 12ml serum free 
DMEM. 2.4ml of the transfection mixture was then added to the cells, followed by incubation 
for 4hrs at 37°C/5% CO2. The transfection mixture was then removed by aspiration, followed by 
the addition of 25ml of DMEM/1 0% Fetal Bovine Serum. Cells were then incubated at 37**C/5% 
CO2. 

For GPCR Fusion Protein, preferred amounts to the above are as follows: 1 2\i% DNA; 2nil 
serum free DMEM; 60^1 lipofectamine; 293 cells 9 and an addition of 12ml serum free DMEM). 

Example 3 

Tissue Distribution of GPCR 

For some orphan receptors, it will be apparent to those in the art that there is an 
understanding of the distribution of such receptors within, e.g., a human, or associated with a 
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disease state. However, for many orphan receptors, such information is not known, or will not be 
known. It is therefore preferred that some understanding of vAiqtc such receptors may be 
distributed be understood; this allows for the ability to gain a predictive opportunity to associate 
a particular receptor with a disease state or disorder associated with the particular tissue where the 
receptor may be preferentially expressed. Using a commercially available mRN A dot-blot format, 
the distribution of endogenous, constitutively active GPCRs in various tissue types was assessed. 

Preferably, the entire coding region of the receptor is used to generate a radiolabeled 
probe using a Prime-It II™ Random Primer Labeling Kit (Stratagene, #300385), according 
to the manufacturer's instructions. As an example, this approach was utilized for GPR4. 

Human RNA Master Blot™ kit (Clontech, #7770-1) was hybridized with this probe 
and washed imder stringent conditions, in accordance with manufacturer instructions. The 
blot was exposed to Kodak BioMax™ Autoradiography film overnight, at -80 "^C. Results are 
presented in Figure 3 . Based upon these results, it is noted that GPR4 appears to be expressed 
throughout a variety of fetal tissue types (row G), as well as non-fetal heart (C 1 ), and non-fetal 
lung (Fl). This approach can be readily utilized for other: receptors. 
Example 4 

GTP Membrane Binding Scintillation Proximity Assay 

When a G piotein-coi^jled receptor is in its active state, either as a result of ligand binding 
or constitutive activation, the receptor binds to a G protein (in the case of GPR3, GPR4, GPR6, 
GPR12, GPR21, GHSR, OGRl, RE2 and AL022171, Gs) and stimulates the binding of GTP to 
the G protein. The trimeric G protein-receptor complex acts as a GTPase and slowly hydrolyzes 
the GTP to GDP, at which point the receptor normally is deactivated. Constitutively activated 
receptors continue to exchange GDP for GTP. The non-hydrolyzable GTP analog, p^SJGTPyS, 
can be utilized to demonstrate enhanced binding of pS]GTPyS to membranes expressing 
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constitutively activated receptors. The advantage of using p^S]GTPyS binding to measure 
constitutive activation is that: (a) it is generically applicable to all G protein-coupled receptors; (b) 
it is proximal at the membrane surface making it less likely to pick-iq? molecules which affect the 
intracellular cascade. 

The assay utilizes the ability of G protein coupled receptors to stimulate p^SJGTPyS 
binding to membranes expressing the relevant receptors. The assay can, therefore, be used in 
the direct identification method to screen candidate compoimds to known, orphan and 
constitutively activated G protein coupled receptors. The assay is generic and has application 
to drug discovery at all G protein coupled receptors. 

The P^S]GTPyS assay was incubated in 20 mM HEPES, pH 7.4, binding bufifer with 12 
nM p^SJGTPyS and 75 jag membrane protein [e.g., 293T cells expressing GPRS] and 1 ^iM GDP 
for 1 hour, Wheatgerai agglutinin beads (25 ^1; Amersham) were then added and the mixture was 
incubated for another 30 minutes at room temperature. The tubes were then centrifuged at 1500 
X g for 5 minutes at room temperature and flien counted in a scintillation counter. 

Referring to Figure 4, GPR3 receptor was determined to have increased activity as 
compared to control; this heightened activity is not the result of autocrine stimulation in that the 
data were obtained fix>m membrane preparations, as opposed to whole cell prq>arations. 
Example 5 

Receptor Homology Determination 

Following confirmation that GPR3 is a constitutively activated receptor, a homology 
search of the available G protein-coupled data banks (GeneBank), using the commercially 
available program, DNA Star, identified two highly homologous receptors, GPR6 and GPRl 2 {see 
Figure 5A); both of these receptors are oiphan receptors. While the sequence of these receptors 
was previously "known" (i.e., they were available on the databases), it was not known that these 
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two receptors are constitutively activated in their endogenous forms {see Example 6, Figure 7). 
Furthermore, heretofore there would be no reason to search for such receptors for use in a drug 
discovery program in that the ligands therefore are not known or have not been identified. As 
such, the dogma ^jproach to dmg discovery would at best find the homology between GPR3, 
GPR6 and GPR12 of minor interest or, more likely, irrelevant 
Example 6 

Analysis of Homologous Receptors For Constitutive Activation 

Although a variety of cells are available to the art for the expression of proteins, it is 
most preferred that mammalian cells be utilized. The primary reason for this is predicated 
upon practicalities, le., utilization of, e.g., yeast cells for the expression of a GPCR, while 
possible, introduces into the protocol a non-mammalian cell which may not (indeed, in the 
case of yeast, does not) include the receptor-coupling, genetic-mechanism and secretary 
pathways that have evolved for mammalian systems - thus, results obtained in non- 
mammalian cells, while of potential use, are not as preferred as that obtained fi"om mammalian 
cells. Of the mammalian cells, COS-7, 293 and 293T cells are particularly preferred, although 
the specific mammalian cell utilized can be predicated upon the particular needs of the artisan. 

1. Anatysis of GPR3, GPR6 and GPR12 

To generate a B-galactosidase reporter containing multiple Gal4 binding sites, a Bgl n/ 
Hindm fiiBgment was removed fi^m the somatostatin promoter-containing plasmid 
1.4(5xGal)CAT(Leonani, J. ei2l{\992)PNASUSA 89:6247-6251) and cloned into pB gal-Basic 
(Promega). The Bgl II/ Hindlll fitagment contains a variant of the minimal somatostatin promoter 
(from -71 bp to +50 bp relative to the transcription start site) in which the core 4bp of the cAMP 
Response Element (-46 to -43) were replaced with 5 copies of the recognition sequence for the 
yeast transcription factor Gal4. When this reporter is co-transfected with an expression plasmid 
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encoding a Gal4-CREB fusion protein, it is highly responsive to agents that increase the cAMP 
signaling pathway, 

VIP2.0ZC is a cell line that has been stably transfected with the reporter gene fi- 
galactosidase under the control of a cAMP responsive VIP promoter (Konig et al. MoL Cell.Neuro. 
1991, 2, 331-337). The cell line was used here to indirectly measure the accumulation of 
intracellular cAMP. Approximately 2 million cells were plated in 6 cm plate the day before 
transfection. DNA (5 ^ig), for each receptor, was mixed with 2.5 ml serum-ftee DMEM 
containing 200 |xg/ml DEAE dextran and 100 \xM chloroquine, and added to a rinsed cell 
monolayer. After incubation for 90 min in a CO2 incubator, the transfection medium was 
removed. The cells were washed with serum-free medium and supplemented with fresh complete 
medium. Twenty four hours after transfection, the cells were replated into 96-well plate at a 
density of 50 - 100 K per well and the B-galactosidase activity was assayed 48 to 72 hours after 
transfection. 

The assay buflFer contained 100 mM sodium phosphate, 2 mM MgS04, 0.1 mM MnCl2, 
pH 8.0. The cells were washed with PBS, and 25 \xl /well of hypotonic lysis buffer consisting of 
0.1 X assay buffer was added. Ten minutes later, 100 jil of assay buffer containing 0.5% Triton 
X-1 00 and 40 mM B-mercaptoethanol was added to each well and incubation at room temperature 
continued for 10 minutes. The substrate solution containing 5 mg/ml chlorophenol red- fl-D- 
galactopyranoside (CPRG) in assay buffer was added at 25 fil/well and the plate was incubated 
at 37*^C for 30 minutes before absorbance at 595 nm was measured with aplate reader. 

GPRS, GPR6 and GPRl 2 were assayed using the foregoing system, and it was determined 
that both GPR6 and GPR12 are constitutively active. See Figure 6A. 

2. Analysis of GPR21 and AL022171 
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293 and 293T cells were plated-out on 96 well plates at a density of 2 x 1 0"* cells per 
well and were transfected, using Lipofectamine Reagent (BRL), the following day according 
to manufactvirer instructions. A DNA/lipid mixture was prepared for each 6-well transfection 
as follows: 260ng of plasmid DNA in lOOjil of DMEM were gently mixed with 2^1 of lipid 
in 1 OO^il of DMEM (the 260ng of plasmid DNA consisted of 200ng of a 8xCRE-Luc reporter 
plasmid, 50ng of pCMV comprising endogenous receptor or non-endogenous receptor or 
pCMV alone, and 1 Ong of a GPRS expression plasmid (GPRS in pcDNA3 (Invitrogen)). The 
8XCRE-Luc reporter plasmid was prepared as follows: vector SRIF-p-gal was obtained by 
cloning the rat somatostatin promoter (-71/+51) at BglV-Hindlll site in the ppgal-Basic 
Vector (Clontech). Eight (8) copies of cAMP response element were obtained by PGR from 
an adenovirus template AdpCF126CCRE8 (see 7 Human Gene Therapy 1883 (1996)) and 
cloned into the SRIF-p-gal vector at the Kpn-Bgl V site, resulting in the 8xCRE-p-gal reporter 
vector. The 8xCRE-Luc reporter plasmid was generated by replacing the beta-galactosidase 
gene in the 8xCRE-P-gal reporter vector vsdth the luciferase gene obtained from the pGL3- 
basic vector (Promega) at the Hindlll-BamHI site. FoUov^ng 30 min. incubation at room 
temperature, the DNA/lipid mixture was diluted with 400 jil of DMEM and lOO^xl of the 
diluted mixture was added to each well. 1 00 \x\ of DMEM with 1 0% FCS were added to each 
well after a 4hr incubation in a cell culture incubator. The following day the transfected cells 
were changed with 200 fil/well of DMEM with 10% FCS. Eight (8) hours later, the wells 
were changed to 100 ^il /well of DMEM without phenol red, after one wash vnih PBS. 
Luciferase activity were measured the next day using the LucLite™ reporter gene assay kit 
(Packard) following manufacturer instructions and read on a 1450 MicroBeta™ scintillation 
and limiinescence coimter (Wallac). Results are summarized in Figures 6B and 6C. 
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GPR2 1 and AL0221 71 were assayed using the foregoing system, and based upon these 
results, it was detennined that both GPR21 and AL022171 are constitutively active in their 
endogenous forms. See Figure 6B and 6C. 

3. Analysis of GPR4, RE2, OGRl and GHSR 

Using the protocols defined herein, GPR4, RE2, OGRl and GHSR were analyzed and 
determined to be constitutively active in their endogenous forms (data not shown). 
Example 7 

Tissue Distribution of GPR3, GPR6 and GPR12 

Tissue samples were examined for expression of these orphan receptors by comparative 
RT-PCR, using the following primers: 
GPR3: 

5'-CTGGTCCTGCACTTTGCTGC-3' (SEQ. ID. NO.: 28) 
5'-AGCATCACATAGGTCCGTGTCAC-3' (SEQ.ID.NO.: 29) 
Hiese primers amplify a 194bp fiagment 
GPR6: 

5'-ACCAGAAAGGGTGTGGGTACACTG-3' (SEQ. ID. NO.: 30) 
S'-GGAACGAAAGGGCACTTTGG-S' (SEQ. ID. NO.: 31) 
These primers amplify a 249bp fragment 

GPR12: 

5'-GCTGCCTCGGGATTATTTAG-3' (SEQ. ID. NO.: 32) 
5'-GCCTATTAGCAGGAACATGGGTG-3' (SEQ. ID. NO.: 33) 
These primers amplify a 220bp fragmrait. 
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These amplicons were designed to be non-overlapping, i.e., there is no sequence similarity 
between them, and to have similar Tm's, such that each primer pair amplifies its respective target 
at the same optimal annealing temperature. This diminishes the chance that an amplicon fit>m one 
primer pair will act as an annealing target for the other primers in the multiplex reaction, therefore 
reducing the chance of interference with other primer pairs. 

Total RNA was extracted fit)m tissue samples (human) using TRIzol™ Reagent 
(Gibco/BRL), following manufacturer instructions. cDNA was generated using 2mg total RNA 
and a First-Strand™ cDNA synthesis kit (Pharmacia). The cDNA samples were then diluted 1 :3 
in H2O and comparative PGR was performed as described (Jensen, J. et al, (1996) J. Biol. Chem. 
271:1 87490) in the presence of P^P]dCTP. All reactions included the SP 1 -specific primers, which 
ampliiy a 300bp Augment, to serve as an internal control. Using the primers outlined above, imder 
defined PGR conditions (1 cycle: 95°G, 5min; 23 cycles: 95*^G, 30sec, 58°C, 30sec, 72°C, Imin; 
1 cycle: 72°G, 1 Omin) gave consistently reliable and quantitatively accurate results. It was finlher 
determined that the selected primer pairs did not interfere with each other vAien multiplexed. PGR 
products were visualized by denaturing gel electrophoresis (7M xirea, 5% polyaciylamide (Long 
Ranger™ Solution, AT Biochemical, 0.6 XTBE) and subsequent autoradiogr^hy. 

Figures 7A, 7B, and 7G show the distribution of GPRS, GPR6 and GPR12 across human 
tissues. This information allows for assessing disease states that are associated with such tissue, 
as well as determining specific regions within such tissue viiere such egression predominates, 
thus allowing for correlating such receptor expression with particular disease states. This, in tum, 
then allows for direct identification of compounds that impact such receptors, without the need to 
imderstand or know the endograous ligand for such receptor. Further screening reveals that GPR3 
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is expressed in much higher levels in human epilepsy tissue samples (tissue source: temporal 
cortex), as compared with controls, as evidenced by RT-PCR analysis (Figure 8). 
Example 8A 

Functional Analysis - GPR6 (In Situ Analysis) 

The distribution of GPR6 in the hypothalamus suggested possible involvement in feeding 
behavior. Accordingly, a functional analysis ofthis receptor was undertaken. In situ analysis was 
conducted as follows: 

1. Probe Design 

GPR6 probe was produced from a 450bp Hindlll-Scal fragment of the GPR6 receptor 
cloned into the Hindlll-Smal site of pBluescriptSK+. Riboprobes were produced using a T7 
transcription system in a standard labeling reaction consisting of: 1 |ig of linearized plasnud, 
2\i\ of 5x transcription buffer, 125^Ci of ^^S-UTP, 150^M of OTP, CTP and ATP, 12.5mM 
dithiothreitol, 20U of RNase inhibitor and 6U of appropriate polymerase. The reaction was 
incubated at 2TC for 90 min., labeled probe bemg separated from free nucleotides over 
Sephadex G-50 spin columns. 

2. Tissue preparation 

Dissected tissue was frozen in isopentane cooled to -42^C and subsequentiy stored at 
-80 °C prior to sectioiung on a cryostat maintained at -20°C. Slide-moimted tissue sections 
were then stored at -80**C. 

3. In Situ Hybridization Protocol 

Tissue sections were removed from the -SO'^C freezer and incubated with a 1 jig/ml 
solution of proteinase-K to permeabilize the tissue and inactivate endogenous RNase. After 
this treatment, sections were incubated in succession in water (1 min), 0.1 M triethanolanune 
(pH 8.0; 1 min), and 0.25% acetic anhydride in 0.1 M triethanolamine (10 min). The tissue 
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was then washed in 2 x SSC (0.3 mM NaCl, 0.03 nM Na citrate, pH 7.2; 5 min) and 
dehydrated through graded concentrations of ethanol. Sections were then hybridized Avith 1 .5 
X 10^ dpm of [35s]UTP-labeled cRNA probes in 20 \il of a hybridization buffer containing 
75% formamide, 10% dextran sulfate, 3 x SSC, 50 mM sodium phosphate buffer (pH 7.4), 1 
X Denhart's solution, 0.1 mg/ml yeast tRNA, and 0.1 mg/ml sheared salmon sperm DNA. 
Tissue sections were covered with coverslips that were sealed with rubber cement. The slides 
were incubated overnight at 50^C. On the following day, the rubber cement was removed, the 
coverslips were soaked-off with 2 x SSC, and the tissue sections were washed for 10 min in 
fresh 2 X SSC solution. Single stranded probe not hybridized with endogenous mRNAs was 
removed by incubating the sections for 30 min in 200 |ig/ml solution of RNase-A at 37**C. 
The tissue was then washed in increasingly stringent SSC solutions (2, 1 and 0.5 x SSC; 10 
min each), followed by a 1 hr wash in 0.5 x SSC at 60°C. After this final wash, the tissue 
sections were dehydrated using graded concentrations of ethanol, air dried and prepared for 
detection by x-ray autoradiography on Kodak XAR-5 film. 
4. Analysis 

Utilizing the above protocol on normal male rats (Sprague-Dawley, Charles River), 
it was determined that GPR6 is expressed in the following areas of the brain: hypothalamus, 
hippocampiis, nucleus accumbens, caudate and cerebral cortex. See Figure 9A for a 
representative tissue section (GPR6 receptor is presented in the dark areas; Figure 9B provides 
a reference map of the rat brain.) 

Given the high levels of expression of GPR6 in areas of the brain associated with 
feeding, an in situ analysis was conducted using the above protocol on both lean and obese 
male Zucker rats (Charles River). As those in the art appreciate, the Zucker animals are 
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genetically bred to result in animals that exhibit a lean or obese phenotype. Figure lOA 
provides a representative tissue section of GPR6 receptor expression in the lean Zucker 
animals; Figure 1 OB provides a representative tissue section of GPR6 receptor expression in 
the obese Zucker animals; Figure IOC is a reference map of this section of the rat brain. 
These results support the position that the endogenous, constitutively activated orphan 
receptor GPR6 is relatively overexpressed in a model of obesity. 
Example 8B 

Functional Analysis - GPR12 (In Situ Analysis) 

In situ analysis for the GPR12 receptor was conducted as follows: 

1 . Probe Design 

GPR12 probe was produced from a 5 1 5bp (NTS - NT520) HindlH-BamHI fragment 
of the rat GPRl 2 receptor cloned into the Hindlll-BamHI site of pBluescriptSK+. Riboprobes 
were produced using a T3/T7 transcription system in a standard labeling reaction consisting 
of: l|ig of linearized plasmid, 2|il of 5x transcription buffer, 125[iCi of ^^S-UTP, ISOfiM of 
GTP, CTP and ATP, 12.5mM dithiothreitol, 20U of Rnase inhibitor and 6U of appropriate 
polymerase. The reaction was incubated at ST^'C for 90 min., labeled probe being separated 
from free nucleotides over Sephadex G-50 spin colunans. 

2. Tissue preparation 

Dissected tissue was frozen in isopentane cooled to -42**C and subsequently stored at 
-80 ^'C prior to sectioning on a cryostat maintained at -20°C. Slide-moimted tissue sections 
were then stored at -80 °C. 

3. In Situ Hybridization Protocol 

Tissue sections were removed from the -80**C freezer and incubated with a 1 |ig/ml 
solution of proteinase-K to permeabilize the tissue and inactivate endogenous RNase. After 
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this treatment, sections are incubated in succession in water (1 min), 0.1 M triethanolamine 
(pH 8.0; 1 min), and 0.25% acetic anhydride in 0.1 M triethanolamine (10 min). The tissue 
was then washed in 2 x SSC (0.3 mM NaCl, 0.03 nM Na citrate, pH 7.2; 5 min) and 
dehydrated through graded concentrations of ethanol. Sections were then hybridized with 1 .5 
X 10^ dpm of [35s]UTP-labeled cRNA probes in 20 ^il of a hybridization buffer containing 
75% formamide, 10% dextran sulfate, 3 x SSC, 50 mM sodium phosphate buffer (pH 7.4), I 
X Denhart's solution, 0.1 mg/ml yeast tRNA, and 0.1 mg/ml sheared salmon sperm DNA. 
Tissue sections were covered with coverslips that were sealed with rubber cement. The slides 
were incubated overnight at 50''C. On the following day, the mbber cement was removed, the 
coverslips were soaked-off with 2 x SSC, and the tissue sections were washed for 10 min in 
fresh 2 X SSC solution. Single stranded probe not hybridized with endogenous mRNAs was 
removed by incubating the sections for 30 min in 200 |ig/ml solution of RNase-A at 37°C. 
The tissue was then washed in increasingly stringent SSC solutions (2, 1 and 0.5 x SSC; 10 
min each), followed by a 1 hr wash in 0.5 x SSC at 60**C. After this final wash, the tissue 
sections were dehydrated using graded concentrations of ethanol, air dried and prepared for 
detection by x-ray autoradiography on Kodak XAR-5 film. 
4. Anafysis 

Utilizing the above protocol on normal male rats (Sprague-Dawley, Charles River), 
it was determined that GPRl 2 is expressed in the following areas of the brain: hippocampus 
(particularly in regions CA3, CA4 and the dentate gyrus; outer layers of the cerebral cortex; and 
the amygdala - all of these regions are well known in the art as associated with regions 
important for learning and memory); and thalamic relay nuclei, including the lateral geniculate 
nucleus, the medial geniculate nucleus and the lateral thelamic nucleus (regions related to 
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lateral relay functions, e.g., vision and hearing). See Figures 1 1 A-F for representative tissue 
sections (GPR12 receptor is presented in the dark areas). 
Example 8C 

Functional Analysis - Co-Localization of GPR6 With Feeding-Behavior Receptors 
(In Situ Analysis) 

The human orexin receptor OX,R, previously an orphan GPCR (ak.a, "HFGAN72"), has 
been localized in the lateral hypothalamic region of the brain and has been hypothesized to be 
involved in regulation of feeding behavior. Sakurai, T. et al 92(4) Cell 573 (1998). As noted in 
Sakurai, "pharmacological intervention directed at the orexin receptors may prove to be an 
attractive avenue toward the discovery of novel therapeutics for diseases involving disregulation 
of energy homeostasis, such as obesity and diabetes mellitus." Id at 582. The melanocortin-3 
receptor (MC-3) has also been identified, Gantz, I. Et al, 268(1 1) J. Biol. Chem. 8246 (1993), and 
is similarly associated with energy homeostasis. 

An understanding of the neural pathways involved in the regulation and disregulation of 
energy homeostasis is important for appreciation of hierarchical nuances that are critical for 
rational drug design. Merely affecting one receptor, particularly a receptor that is "downstream" 
of a more relevant receptor-pathway, may lead to a substantial e?q>enditure of time and resoiirces 
Aat ultimately results in the development of a pharmaceutical compound that may have little, if 
any, substantive impact on a particular disease state. For example, leptin, while clearly involved 
in some fashion with energy homeostasis, has not, to date, evidenced an opportunity for the 
development of a pharmaceutical product in tiie area of obesity. And Miiile both the OXj and MC- 
3 receptors (as well as other melanocortin receptors) are also, in some manner, involved with 
energy homeostasis, development of pharmaceuticals based upon the traditional receptor 
"antagonist" approach may prove to be more fiustrating than fiiiitful if, for example, these 
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receptors are not constitutively active in their endogenous forms, and, within the energy 
homeostasis pathway, there is a receptor that is constitutively active in its endogenous state. 
Indeed, the endogenous, constitutively active receptor would, by delBnition, continually signal 
wiiereas the endogenous, non-constitutively active receptors would require ligand-binding for such 
signaling. Thus, in the case of GPR6, which is not only constitutively active in its endogenous 
form, but also appears to be significantly i5>-regulated in an animal model of obesity, GPR6 
would, in essence, "trump" other energy homeostasis related receptors in that even with complete 
blockage via receptor antagonists to these receptors, GPR6 would continue signaling, TTius, a 
determination of whether these receptors (and others within the energy homeostasis pathway) are 
co-localized within discrete, neuronal regions, is usefid in providing a more refined receptor target 
for drug development 

In situ hybridization studies were performed as described above for GPR6, OXjR and MC- 
3 receptors. For GPR6, the in situ probe utilized was as set forth in Example 7 A. ForOX,Rand 
MC-3, the probes were based upon the published rat sequences and were approximately 950bp and 
44 1 bp, respectively. Tissue preparation (normal rats) and in situ hybridization were substantially 
the same as set forth in Example 8 A. 

Results are presented in Figure 12 (GPR6 and OX,R) and Figure 13 (GPR6 and MC-3), 
A^ere a red filter was used for GPR6 hybridization and a green filter was used for OX,R (Figure 
12B)andMC-3 (Figure 13B). Figures 12Cand 12D (a magnified version of 12A) are generated 
by overlay of Figures 1 2A and 12B; co-localization is evidenced by areas having an orange color 
(from the combination of red and green). Thus, in Figure 1 2C, it can be seen that GPR6 and OXjR 
are co-localized in a sub-set of cells in the lateral arcuate and in the ventromedial hypothalamic 
nucleus, both of these regions being involved in the energy homeostasis pathways. A similar 
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overlay-procedure for Figures 1 3 A (GPR6) and 1 3B (MC-3) provides evidence that these receptors 
are co-localized primarily in the lateral arcuate. 

Information continues to develop within the art as to the neural pathways associated Avith 
feeding behavior. An important component of this pathway is the neuropeptide agouti-related 
peptide (AGRP) sometimes referred to as agouti-related transcript (ART). The expression of 
AGRP is largely restricted to the arcuate nucleus {see Flier, J. S. and Maratos-Flier, E., and Figure 
1 therein ). The cells that produce AGRP also produce neuropeptide Y (NPY). Animal studies 
have evidenced that administration of AGRP and administration of NPY leads to increases in 
feeding behavior and obesity. AGRP has also been shown to be an antagonist to the melanocortin 
4 (MC-4) receptor, and antagonism of the MC-4 receptor is also known to increase feeding 
behavior an obesity. Thus, AGRP appears to be involved in at least two pathways associated with 
feeding behavior. As set forth below, it has been discovered that the GPR6 receptor is co-localized 
within cells that produce AGRP, and based upon the results set forth below in Example 8, coupled 
with the fact that GPR6 is an endogenous, constitutively activated GPCR, it is apparent that GPR6 
is in some manner a potential "regulator" of the system - vAi^n expression of the GPR6 receptor 
is reduced via the use of antisense protocols (Example 9) there was a exceedingly rapid loss in 
body weight of the animals tested, suggesting ^t GPR6 may regulate the expression of AGRP. 

Unlike the "overlay" approach above, the protocol set forth in Maiks, D.L, et al, 3 Mol 
& Cell Neuro. 395 (1992) was utilized for assessment of co-localization. AGRP (the AGRP 
cRNA probe was synthesized from a 382bp fiagment of AGRP cDNA cloned into Bluescript SK 
vector) was analyzed in conjvmction with radiolabeled GPR6 and both were foimd to be co- 
localized in the arcuate {see Figure 14). Given the role that AGRP plays with respect to 
homeostasis, and further given hat GPR6 is constitutively active in its endogenous state, the results 
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obtained fix)m Example 9, infra^ would be consistent with these data in that the almost inunediate, 
significant loss of weight can be understood in the context of GPR6 influencing AGRP. 

Example 9 

Functional Analysis - GPR6 (In Vivo Analysis: GPR6 Antisense) 

Based upon the resiilts developed from Example 7, and ^\i^ile not wishing to be boimd by 
any particular theory, it was hypothesized that reduction in the expression of the GPR6 receptor 
would lead to a reduction in, inter alia, feeding behavior, metabolism, body weight, etc.; thus, by 
decreasing expression of this receptor via use of an antisense oligonucleotide, it was hypothesized 
that such animals would evidence changes in functional feeding behavior and/or feeding-related 
metabolism. Examination of this hypothesis was considered analogous to utilization of an inverse 
agonist to the receptor in that an inverse agonist would reduce the constitutive activity of the GPR6 
receptor, akin to reducing the expression of the receptor itself. It is noted that such an approach 
results in "knock-down", as opposed to "knock-out", of the receptor, /.e., in general, it is accepted 
that an antisense ^proach reduces expression of the target protein by approximately 30%. 

Sixteen adult male Sprague-Dawley rats (Harlan, San Diego) were used for this study. 
Animals were vivarium-acclimated for at least one week prior to use. Animals were housed 
(groups of two) in hanging plastic cages with food and water available ad lib. Animals were 
weighed and handled for at least one day prior to surgery (to establish baseline weight) and 
throughout the study (to assess the effects of the treatment). Daily food intake for pairs of 
animals in a cage was assessed by weighing the food in the feeding trough each morning 
before and after refilling. Groups included antisense (n = 5), missense (n = 4) and sterile 
water (n = 5). 
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Surgeries were performed under sodium pentobarbital anesthesia (60 mg/kg), 
supplemented with halothane as necessary. Animals were stereotaxically implanted with a 
single cannula (brain infusion kit, Alza Pharmaceuticals) aimed at the lateral ventricle 
(bregma, AP -1 .0, Lat -1 .5, DV -3.8 from the sxirface of the brain). The inlet of the caimula 
was coimected via flexible tubing to the outlet of an osmotic minipump (Model 2001, Alza 
Pharmaceuticals), that was implanted subcutaneously between the shoulder blades according 
to instructions provided by the manufacturer. 

Pumps contained antisense oligonucleotide 5*-GsCTAGCGTTCATCGCCGsC-3' 
(SEQ.ID.NO.:34; antisense) (wherein the small "s" denotes a phosphorothioate linkage) or 
missense oligonucleotide 5'-CsTGGACTGTATCGCCCCsG-3' (SEQ.ID.NO.: 35; missense), or 
sterile water vehicle. Oligonucleotides were synthesized by Genset Corp and diluted to 2jig/^l 
in sterile water. Because the pumps utilized deliver Ijil/hour, animals received 48^g/day of 
antisense or missense oligonucleotides, or 24|il/day of sterile water. Pumps were primed prior 
to implant by incubation in sterile saline at 37°C for at least four hours prior to implant 

Five days after surgery, animals were treated with d-amphetamine sulfate; six days 
after surgery, baseline and amphetamine-stimulated locomotor behavior were examined; seven 
days after surgery, animals were euthanized and brains rapidly removed and frozen for 
histological analysis. 

Animals were taken from the vivarium to the testing room, placed into an open field 
enclosure {see below), and baseline activity assessed for 30 minutes. At the end of 30 
minutes, animals were briefly removed from the enclosure, injected with d-amphetamine 
sulfate (1 .0 mg/kg s.c, diluted in sterile saline; National Institute on Drug Abuse Drug Supply 
Program), and immediately returned to the enclosure for 150 minutes. Locomotor behavior 




wo 00/06597 PCT/US99/17425 

-45- 

was quantified at 10 minute intervals in order to follow the time-course of baseline and 
amphetamine-stimulated activity. 

Baseline and amphetamine-stimulated locomotor behavior were assessed in a San 
Diego Instruments Flex Field System, consisting of 16" x 16" x 1 5" clear plexiglas open field 
enclosures. Photocell arrays (16 in each dimension) which surrounded the open fields were 
interfaced with a personal computer for collection of data. One array at 2" above the floor of 
an enclosure detected locomotor activity, and a second at 5" detected rearing behavior. The 
computer provided a variety of measures of locomotor activity, including total photocell beam 
breaks, time active, time resting, distance traveled, total number of rears, and time spent 
rearing (data not shown). During testing, the testing room was dimly lit by an overhead 
incandescent bulb, with white noise to mask outside sounds. 

Results are presented in Figures 15 and 16. In Figure 15, it is noted that animals 
receiving the antisense oligonucleotide (GPR6 "knock-down" animals) had significantly 
greater loss of weight as compared with either the missense oligonucleotide-treated animals, 
or the control-treated animals. With respect to locomotor activity, the results of Figure 16 
support the position that the base-line and amphetamine-treatment locomotor activities were 
substantially the same across all three groups. 
Example 10 

GPCR Fusion Protein Preparation 

The design of the endogenous, constitutively activated GPCR-G protein fusion 
construct was accomplished as follows: both the 5' and 3' ends of the rat G protein Gsa (long 
forai; Itoh, H. et al., 83 PNAS 3776 (1986)) were engineered to include a Hindlll (5'- 
AAGCTT-3') sequence thereon. Following confirmation of the correct sequence (including 
the flanking Hindlll sequences), the entire sequence was shuttled into pcDNA3.1(-) 
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(Invitrogen, cat. no. V795-20) by subcloning using the Hindlll restriction site of that vector. 
The correct orientation for the Gsa sequence was determined after subcloning into 
pcDNA3.1(-). The modified pcDNA3.1(-) containing the rat Gsa gene at Hindlll sequence 
was then verified; this vector was now available as a "universal" Gsa protein vector. The 
pcDN A3 . 1 (-) vector contains a variety of well-known restriction sites upstream of the Hindlll 
site, thus beneficially providing the ability to insert, upstream of the Gs protein, the coding 
sequence of an endogenous, constitutively active GPCR. This same approach can be utilized 
to create other "imiversal" G protein vectors, and, of course, other commercially available or 
proprietary vectors known to the artisan can be utilized - the important criteria is that the 
sequence for the GPCR be upstream and in-fi-ame with that of the G protein. 

Both GPR3-Gsa Fusion Protein construct and GPR6-Gsa Fusion Protein constmct 
were then made as follows: primers were designed for both the GPR3 and GPR6. For GPR3, 
the primers were as follows: 

5'-gatcTCTAGAATGATGTGGGGTGCAGGCAGCC-3' (SEQ. ID, NO. 36; sense) 
5'-ctagGGTACCCGGACATCACTGGGGGAGCGGGATC-3' (SEQ. ID. NO. 37, antisense) 
The sense and anti-SCTse primers included the restriction sites for Xbal and Kpnl, respectively. 
For GPR6, the primers were as follows: 

5'.gatcTCTAGAATGCAGGGTGCAAATCCGGCC-3' (SEQ. ID. NO. 38, sense) 
5'-ctagGGTACCCGGACCTCGCTGGGAGACCTGGAAC-3' (SEQ.ID.NO. 39, antisense). 
The sense and anti-sense primers also contained restriction sites for Xbal and Kpnl, respectively. 
These restriction sites are available upstream of the HindlU site in the pcDNA3.1(-) vector. 

PGR was then utilized to secure the respective receptor sequences for fusion within the 
Gsa universal vector disclosed above, using the following protocol for each: lOOng cDNA for 
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GPRS and GPR6 was added to separate tubes containing 2ul of each primer (sense and anti-sense), 
3uL of lOmM dNTPs, lOuL of lOXTaqPlus™ Precision buffer, luL of TaqPlus™ Precision 
polymerase (Stratagene: #6002 1 1 ), and 80uL of water. Reaction temperatures and cycle times for 
GPR3 were as follows: the initial denaturing step was done it 94°C for five minutes, and a cycle 
of 94°C for 30 seconds; 55*^0 for 30 seconds; 72^C for two minutes (repeated 30 times for GPR3), 
A final extension time was done at 72°C for ten minutes. For GPR6, the initial denaturing step 
was done at 96®C for seven minutes, and a cycle of 96°C for 30seconds, 55°C for 30 seconds, and 
72°C for two minutes was repeated 30 times. A final extension time of ten minutes at 72°C was 
done for GPR6. Both PGR products for GPR3 and GPR6 were ran on a 1 % agarose gel and then 
purified (data not shown). Each purified product was digested with Xbal and Kpnl (New England 
Biolabs) and the desired inserts were isolated, purified and ligated into the Gs universal vector at 
the respective restriction site. The positive clones were isolated following transfomiation and 
determined by restriction enzyme digest; expression xising 293 cells was accomplished following 
the protocol set forth infra. Each positive clone for GPR3:Gs - Fusion Protein and GPR6:Gs - 
Fusion Protein was sequenced and made available for the direct identification of candidate 
compounds. 

GPCR Fusion Proteins were analyzed as above and verified to be constitutively active 
(data not shown). 
Example 11 

Protocol: Direct Identification of Inverse Agonists and Agonists Using [^S]GTIYS 

Although we have utilized endogenous, constitutively active GPCRs for the direct 
identification of candidate compounds as, e.g,, inverse agonists, for reasons that are not altogether 
understood, intra-assay variation can become exaceri>ated. Preferably, then, a GPCR Fusion 
Protein, as disclosed above, is utilized. We have determined that ^^ilen such a protein is used. 
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intra-assay variation appears to be substantially stabilized, whereby an efifective signal-to-noise 
ratio is obtained. This has the beneficial result of allowing for a more robust identification of 
candidate compounds. 

It is important to note that the following results have been obtained using an orphan 
receptor; as that data support, it is possible, using the techniques disclosed herein, to directly 
identify candidate compoxmds that modulate the orphan receptor as inverse agonists, agonists and 
partial agonists, directly from a primary screen; indeed, the methods disclosed herein are sensitive 
enough to allow for direct identification of both inverse agonist and agonist modulators on the 
same assay plate. 

1. Membrane Preparation 

Membranes comprising the endogenous, constitutively active orphan GPCR fusion protein 
of interest (see Examples 2 and 1 0) and for use in the direct identification of candidate compounds 
as inverse agonists, agonists or partial agonists were prepared as follows: 

(a) Materials 

Membrane Scr^ Buffer was comprised of 20mM HEPES and lOmM EDTA, pH 7.4; 
Membrane Wash Buffer was comprised of 20 mM HEPES and 0.1 mM EDTA, pH 7.4; Binding 
Buffer was comprised of 20mM HEPES, 100 mM NaCl, and 10 mM MgClj, pH 7.4 

(b) Procedure 

All materials were kept on ice throughout the procedure. Firstly, the media was aspirated 
fix)m a confluent monolayer of cells, followed by rinse with 10ml cold PBS, followed by 
aspiration. Thereafter, 5ml of membrane Scrape Buffer was added to scrape cells; this was 
followed by transfer of cellular extract into 50ml centrifuge tubes (centrifiiged at 20,000 rpm for 
17 minutes at 4**C). Thereafter, the supernatant was aspirated and the pellet was resuspended in 
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30ml Membrane Wash Bixffer followed by centrifuge at 20,000 ipm for 17 minutes at A^'C. The 
supernatant was then aspirated and the pellet resuspended in Binding Buffer. This was then 
homogenized using a Brinkman polytron™ homogenizer ( 1 5-20 second bursts until the all material 
was in suspension). This is referred to herein as "Membrane Protein". 
2. Bradford Protein Assay 

Following the homogenization, protein concentration of the membranes was 
determined using the Bradford Protein Assay (protein can be diluted to about 1.5mg/ml, 
aliquoted and frozen (-80^C) for later use; when frozen, protocol for use is as follows: on the 
day of the assay, frozen Membrane Protein is thawed at room temperature, followed by vortex 
and then homogenized with a polytron at about 12 x 1,000 rpm for about 5-10 seconds; it is 
noted that for multiple preparations, the homogenizor should be thoroughly cleaned between 
homoginezation of different preparations). 

(a) Materials 

Binding Buffer (as per above); Bradford Dye Reagent; Bradford Protein Standard were 
utilized, following manufacturer instructions (Biorad, cat. no. 500-0006). 

(b) Procedure 

Duplicate tubes were prepared, one including the membrane, and one as a control 
"blank". Each contained 800ul Binding Buffer. Thereafter, lOul of Bradford Protein Standard 
(Img/ml) was added to each tube, and 1 Oul of membrane Protein was then added to just one 
tube (not the blank). Thereafter, 200ul of Bradford Dye Reagent was added to each tube, 
followed by vortex of each. After five (5) minutes, the tubes were re-vortexed and the 
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material therein was transferred to cuvettes. The cuvettes were then read using a CECIL 3 04 1 
spectrophotometer, at wavelength 595. 

3. Direct Identification Assay 

(a) Materials 

GDP Buffer consisted of 37.5 ml Binding Buffer and 2mg GDP (Sigma, cat no. G-7127), 
followed by a series of dilutions in Binding Buffer to obtain 0.2 uM GDP (final concentration of 
GDP in each well was 0.1 uM GDP); each well comprising a candidate compound, had a final 
volume of 200ul consisting of lOOul GDP Buffer (final concentration, 0.1 uM GDP), 50ul 
Membrane Protein in Binding Buffer, and 50ul p^S]GTPyS (0.6 nM) in Binding Buffer (2.5 ul 
P^S]GTPyS per 10ml Binding Buffer). 

(b) Procedure 

Candidate compounds (Tripos, bic, St. Louis, MO) were received in 96- well plates (these 
can be frozen at -80°C). Membrane Protein (or membranes with expression vector excluding the 
GPCR Fusion Protein, as control), were homogenized briefly until in suspension. Protein 
concentration was then determined using the Bradford Protein Assay set forth above. Membrane 
Protein (and control) was then diluted to 0.25mg/ml in Binding Buffer (final assay concentration, 
12.5ugAvell). Thereafter, 100 ul GDP Buffer was added to each well of a Wallac Scintistrip™ 
(Wallac). A 5ul pin-tool was then used to transfer 5 ul of a candidate compound into such well 
(f.e,, 5ul in total assay volume of 200 ul is a 1 :40 ratio such that the final screening concentration 
of the candidate compound is 1 OuM). Again, to avoid contamination, Bftsr each transfer step the 
pin tool was rinsed in three reservoirs comprising water (IX), ethanol (IX) and water (2X) - 
excess liquid should be shaken fix>m the tool after each rinse and dried with paper and kimwipes. 
Th^eafter, 50 ul of Membrane Protein is added to each weU (a control well comprising 
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membranes without the GPCR Fusion Protein is also utilized), and pre-incubated for 5- 1 0 minuteis 
at room temperature (the plates were covered with foil in that the candidate compounds obtained 
fix>m Tripos are light sensitive). Thereafter, 50 ul of p^SJGTPyS (0.6 nM) in Binding Buffer was 
added to each well, followed by incubation on a shaker for 60 minutes at room temperature (again, 
in this example, plates were covered with foil). The assay was then stopped by spinning of the 
plates at 4000 RPM for 1 5 minutes at 22°C, The plates were then aspirated with an 8 channel 
manifold and sealed with plate covers. The plates were then read on a Wallace 1450 using setting 
"Prot. #37" (as per manufacturer instructions). 

Exemplary results are presented in Figure 1 7 A (GPR3 :Gs Fusion Protein) and Figure 1 7B 
(GPR6:Gs Fusion Protein) where each designation is a well comprising a diflFCTent candidate 
compoimd, standard deviations based upon the mean results of each plate are in dashed lines and 
the vertical lines are the percent response. Note in Figure 17A well designation C4 - this 
compound was directly identified as an inverse agonist to the GPR3 receptor. Note in Figure 1 7B 
wells designated G7 and H9 - these compounds were directiy identified as an inverse agonist and 
a agonist, respectively, to the GPR6 receptor. In both cases, these are orphan receptors. 

It is preferred that following such direct id^tification, IC50 Qnv^irse agonist) or EC50 
(agonist) values be determined; those having ordinary skill in the art are credited with utili^g IC50 
and EC50 assay protocols of choice. 
Example 12 

Protocol: Confirmation Assay 

AftCT using an independent assay approach to provide a directly identified candidate 
compound as set forth above, it is preferred that a confirmation assay then be utilized. In this case, 
the preferred confirmation assay is a cyclase-based assay. 
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A modified Flash Plate™ Adenylyl Cyclase kit (New England Nuclear, Cat. No. 
SMP004A) was utilized for confinnation of candidate compounds directly identified as inverse 
agonists and agonists to endogenoxis, constitutively activated orphan GPCRs in accordance with 
the following protocol. 

Transfected cells were harvested approximately three days after transfection. Membranes 
were prepared by homogenization of sxispended cells in buflfer containing 20mM HEPES, pH 7.4 
and lOmM MgClj. Homogenization was performed on ice using a Brinkman Polytron™ for 
approximately 10 seconds. The resulting homogenate was centrifiiged at 49,000 X g for 15 
minutes at 4^C. The resulting pellet was then resuspended in buflFer containing 2QmM HEPES, 
pH 7.4 and 0. 1 mM EDTA, homogenized for 1 0 seconds, followed by centrifugation at 49,000 X 
g for 1 5 minutes at 4*^C. The resulting pellet can be stored at -80^C imtil utilized. On the day of 
direct identification screening, the membrane pellet is slowly thawed at room temperature, 
resuspended in buflFer containing 20mM HEPES, pH 7,4 and lOmM MgCL2, to yield a final 
protein concentration of 0.60mg/ml (the resuspended membranes are placed on ice until use). 

cAMP standards and Etetection BuflFer (comprising 2 ^iCi of tracer [^^I cAMP (1 00 ixl] to 
1 1 ml Detection BuflFer) were prepared and maintained in accordance with the manufiacturer's 
instructions. Assay BviflFer was prepared fiesh for screening and contained 20mM HEPES, pH 7.^^ 
lOmM MgClj, 20mM phospocreatine (Sigma), 0.1 units/ml creatine phosphokinase (Sigma), 50 
|iM GTP (Sigma), and 0.2 mM ATP (Sigma); Assay BuflFer can be stored on ice until utilized. 

Candidate compounds identified as per above (if fix)zen, thawed at room tennpeTature) were 
added to plate wells (3^1/well; 12^M final assay concentration), together with 40 jil Membrane 
Protein (30|ig/well) and 50^x1 of Assay BuflFer. This admixture was then incubated for 30 minutes 
at room temperature, with gentie shaking. 
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FoUovsdng the incubation, 1 00^1 of Detection Buffer was added to each well,, followed by 
incubation for 2-24 hours. Plates were then counted in a Wallac MicroBeta™ plate reader using 
"Prot. #3 1 " (as per manufacturer instructions). 

Although a variety of expression vectors are available to those in the art, it is most 
preferred that the vector utilized be pCMV. This vector has been deposited with the American 
Type Culture Collection (ATCC) on October 13, 1998 (10801 University Blvd., Manassas, VA 
201 1 0-2209 USA) under the provisions of the Bud^)est Treaty for the International Recognition 
of the Deposit of Microorganisms for the Purpose of Patent Procedure. The vector was tested by 
the ATCC and determined to be viable. The ATCC has assigned the following deposit number 
to pCMV: ATCC #20335 1 . A diagram of pCMV (including restriction sites) is set forth in Figure 
18. 

It is intended that each of the patents, applications, and printed publications mentioned in 
this patent document be hereby incorporated by reference in their entirety. As those skilled in the 
art will appreciate, numeroxis changes and modifications may be made to the preferred 
embodiments of the invention without departing from the spirit of the invention. It is intended that 
all such variations fell within the scope of the invention. 



SUBSTITUTE SHEET (RULE2Q 



wo 00/06597 



-54- 



PCTAJS99/17425 



Appendix A 
Figure 3 Grid C de 



A2 - amygdala; A3 - caudate nucleus; A4 - cerebellum; A5 - cerebral cortex; A6 - frontal cortex; 
A7 - hippocampus; A8 - medulla oblongata 

B 1 - occipital cortex; B2 - putamen; B3 - substantia nigra; B4 - temporal cortex; B5 - thalamus; 
B6 - sub-thalamic nucleus; B7 - spinal cord 

CI - heart; C2 - aorta; C3 - skeletal muscle; C4 - colon; C5 - bladder; C6 - uterus; C7 - prostate; 
C8 - stomach 

Dl - testis; D2 - ovary; D3 - pancreas; D4 - pituitary gland; D5 - adrenal gland; D6 - thyroid; 
D7 - salivary gland; D8 - mammary gland 

El - kidney; E2 - liver; E3 - small intestine; E4 - spleen; E5 - thymus; E6 -peripheral leukocyte; 

E-8 lymph node; E9 - bone marrow 

Fl - tonsil; F2 - lung; F3 - trachea; F4 - placenta 

G 1 - fetal biain; G2 - fetal heart; G3 - fetal kidney; G4 - fetal liver, G5 - fetal spleen; G6 - fetal 
thymus; G8 - fetal lung 
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CLAIMS 

What is claimed is: 

1 . A method for directly identifying a candidate compound as a compound selected 
from the group consisting of an inverse agonist, a partial agonist and an agonist, to an endogenous, 
constitutively active G protein coupled orphan receptor, comprising the steps of: 

(a) contacting a candidate compound with GPCR Fusion Protein, said GPCR Fusion 
Protein comprising an endogenous, constitutively active G protein coiQ)led orphan 
receptor and a G protein; and 

(b) determining, by measurement of the compound efficacy at said contacted receptor, 
vviiether said compound is an inverse agonist, a partial agonist or an agonist of said 
receptor. 

2. The method of claim 1 wdierein the compoxind is directly identified as an inverse 
agonist to said orphan recqDtor. 

3. The method of claim 1 wherein the compound is directly identified as an agonist 
to said orphan receptor. 

4. The method of claim 1 \^4le^ein the compound is directly identified as partial 
agonist to said orphan receptor. 

5. A composition comprising a compound identified by the method of claim 2. 

6. A composition comprising a compound identified by the method of claim 3. 

7. A composition comprising a compound identified by the method of claim 4. 

8. The method of claim 1 vsdierein said orphan receptor is selected fixtm the groxsp 
consisting of: GPRS, GPR4, GPR6, GPR12, GPR21, OGRl, GHSR, RE2 and 
AL022171. 
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9. The method of claim 1 wherein said orphah receptor is GPR6. 

10. The method of claim 1 \^erein said G protein is selected fiom the groiq> 
consisting of: Gs, Gi, Gq and Go. 

1 1 . The method of claim 1 wherein said G protein is Gsa. 

12. A method for directly identifying a candidate compound as a compound selected 
from the group consisting of an inverse agonist, a partial agonist and an agonist, to an endogenous, 
constitutively active G protein coupled orphan receptor, comprising the steps of: 

(a) contacting a candidate compound with GPCR Fusion Protein, said GPCR Fusion 
Protein comprising an endogenous, constitutively active G protein cox^jled orphan 
receptor and a Gsa protein; and 

(b) determining, by measurement of the compound efficacy at said contacted receptor, 
wiiether said compound is an inverse agonist, a partial agonist or an agonist of said 
receptor. 

13. The method of claim 1 2 wherein said orphan receptor is selected from the group 
consisting of: GPR3, GPR4, GPR6, GPR12, GPR21, OGRl, GHSR, RE2 and 
AL02217L 

14. The method of claim 12 wherein said orphan receptor is GPR6. 

15. The method of claim 14 wherein said compound is directly identified as a 
compoiind selected fiom the group consisting of an inverse agonist and an agonist 

16. The method of claim 1 5 ^^erein said compound is an invCTse agonist 

17. A composition comprising the compound of claim 1 6. 

18. A method for modulating a G protein coi^led opifaan receptor comprising 
contacting said receptor with a compound identified by the method of claim 1 . 
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19. A method for modulating a G protein coupled oprfian receptor comprising the step 
of contacting said receptor with a compound identified by the method of claim 12. 
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Pst I 
Ava I 
Nci I 



EcoR V 
Ipnd m| Ecj)R I 



Nci I 
Sma I 



jSaipH I^pa ipa I 



BsrB I 
Not I 
I^ae m 



Sac n 

l^stX I pac I 



AAGCTTGATATCGAATTCCTGCAGCCCGGGGGATCCACTAGTTCTAGAGCGGCCGCCACCGCGGTGGAGCTCCAGCTTTT 



I 1 1 1 ¥ 



H V 



i 1 1 1 1 1 h 



TTCGAACTATAGCTTAAGGACGTCGGGCCCCCTAGGTGATCAAGATCTCGCCGGCGGTGGCGCCACCTCGAGGTCGAAAA 

KLDIEFLQFGGSTSSRAAATAVELQLL 
SL ISNSCSPGDPLVLERPPPRWSS SF 
QA YRiPAARGIH F SGRHRGGAPAF 

1 1 1 ! ! \ 1 1 1 1 1 1 1 1 1 1 h 

LSSISNRCGPPDVLELAAAVATSSWSK 
LKIDFEQLGPSGSTRSRGGGRHLELKG 
AQYRIGAARPIW N • LPRWRPPAGAK 



80 



Bss^ n 

GTTCCCTTTAGTGAGGGTTAATTGCGCGCTAGAGGATCTTTGTGAAGGAACCTTACTTCTGTGGTGTGACATAATTGGAC 



1 1 ! 1 1 h 



H 1 1 1 1 1 1 1 1 h 160 



CAAGGGAAATCACTCCCAATTAACGCGCGATCTCCTAGAAACACTTCCTTGGAATGAAGACACCACACTGTATTAACCTG 

FPLVRVNCALEDLCEGTLLLWCD! IG 
CSL- GLIAR RIFYKEPYFCCVT LD 

VPFSEG • LRARGSL • RNLTSVV • HNWT 



I 1 h 



H 1 1 1 1 1 1 1 1 1 1 1 h 



NGKTLTLQASSSRQSPVKSRHHSICI PC 
ER HPNIAR LIKTFSG KQPTVYNS 
TGKLSP - KRALPDKHLFRVETTHCLQV 



AAACTACCTACAGAGATTTAAAGCTCTAAGGTAAATATAAAATTTTTAACTGTATAATGTGTTAAACTACTGATTCTAAT 



H 1 H 



i 1 1 1 1 1 1 1 1- 240 



TTTGATCGATGTCTCTAAATTTCCACATTCCATTTATATTTTAAAAATTCACATATTACACAATTTGATGACTAAGATTA 

QTTYRDLKL GKYKIFKCIUC • TTOSN 
KLPLEI SSKVKI KFLSV CVKLLI Ll 

NYLQRFKALR* I KF VYNVLKY F- 



H 1 1 1 1 1 I 1 1 1 1 h 



H ¥ 



VV LSKPS PLYLIKLHI IH VVSEL 
LS6VSI LELTFIFNKLTYHTLSSIRI 
F RCLNLARLYIYFK TYLTNF QN N 



TGTTTCTGTATTTTAGATTCCAACCTATGGAACTGATGAATGGCACCAGTGGTGGAATGCCTTTAATGAGGAAAACCTGT 

I 1 I 1 1 I 1 1 1 1 1 1 1 1 1 1 h 320 

ACAAACACATAAAATCTAAGGTTCGATACCTTGACTACTTACCCTCGTCACCACCTTACGCAAATTACTCCTTTTGGACA 

CLCILDSNLWN- IIGAVVECL* GKPV 

VCVF' (PTYGTDEWEQWWNAFNEENL 
LFVYFRFQPUELUNGSSGGUPLtfRKTC 

I I 1 1 1 1 1 1 I 1 1 1 1 1 1 1 h 

QKHIKSELRHFQH IPATTSHR • HPFGT 
TQTN - ICV ' PVSSHSCHHFAKLSSFRN 
NTYKLNWGISSIFFLLPPIGKILFVq 



SUBSTITUTE SHEET (RULE 26) 



wo 00/06597 



PCTAJS99/17425 



24/34 



TTTGCTCAGAAGAAATGCCATCTAGTGATGATGAGGCTACTGCTGACTCTCAACATTCTACTCCTCCAAAAAAGAAGAGA 

1 1 1 1 1 1 f 1- i I 1 I 1 1 I h 400 

AAACGAGTCTTCTTTACGCTAGATCACTACTACTCCGATGACGACTGAGAGTTGTAAGATGAGGAGGTTTTTTCTTCTCT 

LLRRNAI • • • GYC • LS^TPYSSKKEE 

F C S E E y PSS DDE AT A D SQHS T PPKK K R 
F A Q K K C HLVMMRLLLTLNILLLQKRRE 

1 1 1 1 1 1 1 1 ! 1 1 1 1 1 1 1 h 

KSLLFAM HHHP QQSEVN EELFSSF 
Q E S S I G DLSSSAVA SE • CBV6GF F F L 
K A F F H W RTI ILSSSVRLMRSRWFLLS 

Sty I 

AAGGTAGAAGACcdcAAGGACTTTCCTTCAG AATTGCTAAGTTTTTTGAGTCATGCTGTGTTTAGTAATAGAACTCTTGC 

I I I I 1 1 1 1 1 I I I I I I 1- 480 

TTCCATCTTCTGGGGTTCCTGAAACGAAGTCTTAACGATTCAAAAAACTCAGTACGACACAAATCATTATCTTGAGAACG 

KGRRPQGLSFRIAKFFESCCV - • NSC 
K V E D P K D FPSELLSFLSHAVFSNRTLA 
R K T P R TFLQNC • V F • VMLCLV I E L L 

I ! 1 1 1 1 1- 1 1 1 i 1 1 1 1 1 1- 

FLLGWPSEKLIALNKSDHQT 'YYFEQ 
F T S S G L SKGESNSLKKL ATNLLLVRA 
L Y F V G L V K R • F Q • T K Q T M S H K T I S S K S 



TTGCTTTGCTATTTACACCACAAAGGAAAAAGCTGCACTGCTATACAAG AAAATTATGGAAAAATATTCTGTA A CCTTTA 

1 1 I I \ I 1 H i 1 1 1 1 » 1 K 560 

AACGAAACGATAAATGTGGTCTTTCCTTTTTCGACGTGACGATATGTTCTTTTAATACCTTTTTATAAGACATTGGAAAT 

LLCYLHHKGKSCTAIQEKYGKIFCNLY 
C F A I Y T T KEKAA L L YKK I MEKYS V T F 
L A L L F T P Q R K K L H C Y T R K L W K N I L • PL 

I 1 \ 1 1 1 1 1 \ 1 1 » 1 1 1 1 ^ 

RSQ • KCWLPFLQVAICSF PFIN QLR • 
Q K A 1 • V V P S F A A S S Y L F I I S F Y E T V K I 
A K S N V G C L F F S C Q' • V L F N H F F 1 R Y G K 

Asel 

TAAGTAGGCATAACAGTTATAATCATAACAT ACTGTTTTTTCTTACTCCACACAGGCATAGAGTCTCTGCTAfTAATAAC 

1 \ 1 I I 1 1 # 1 I I I I 1 1 I o40 

ATTCATCCCTATTCTCAATATTAGTATTCTATGACAAAAAAGAATGAGCTGTGTCCCTATCTCACAGACGATAATTATTC 

K A QL S 'HTVPSYSTQA SVCY - • 
I SR HN S YH HN I LFFLTPHRHRVSA I NN 
• V G I T V I I I Y Y C F P L L H T C I E C L L L IT 

I 1 1 1 1 1 H- 1 1 1 1 1 1 1 1 1 1- 

LYAYCNYDYCVTKE • EVCAYLTQ - YS 

L L C L L • L • L « S N K R V G C L C L T D A I L L 
Y T P k V Tl IUVYQKKKSWVPMSHRSN I V 



Rsa 



I 



TATGCTCAAAAATTGTCTACCTTTAGCTTTTTA ATTTGTAAAGGGGTTAATAAGGAATATTTGATGTATAGTGCCTTGAC 

I I 1 1 1 I I 1 1— 1 1 1 ■ I 1 1 1 720 

ATACCAGTTTTTAACACATGGAAATCGAAAAATTAAACATTTCCCCAATTATTCCTTATAAACTACATATCACGGAACTG 

LCSKIVYL L FNL RG- CIFDY CLD 

Y A Q K L C T F S P L I C K G V K K E Y L M Y S A L T 
y L K N C V P L A P • P V K G L I R N I • C I VP • 

I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1- 

HEPITYR SKLKYLP YPINSTYHRS 
. A • P N H V K L K K I Q L P T L L S Y K I Y L A K V 
I S L F Q T G K A K • N T F P N I LP I Q H I T C Q S 
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BsaB I I>ra I 

TAGACATCATAATCAGCCATACCACATTTCTAGAGGTTTTACTTGCTTTAAAAAACCTCCCACAC^ 
ATCTCTAGTATTAGTCCGTATGGTGTAAACATCTCCAAAATGAACGAAATTTTTTGGAGGGTGTGGAGGGGGACTTGGAC 



P<5.«!AIPHL-RFYLL'KTSHTSP-T- 
BOH H 0 P Y H I C R G F T C F K K P P T P P P E P 
L \ °I I % H T T F V E V L L . A L K M L . P H L P L N L 

I I 1 1 1 I 1 1 1 1 1 1 1 ' ' ' »■ 

l' n Y n A M G C K Y L N KS FVEWVEGQVQ 
I S . L • C Y W M Q L P K V Q K L F G G V G G G S G S 
S I M I L W V V H T S T K S A K F F R G C R G R F R 

. Hinc n 

Mfel Hpa 1 

aaacataaaatgaatccLttgttcttgtt'aacttgtttattgcagcttataatggttacaaataaagcaatagc^ 880 

TTTGTATTTTACTTACGTTAACAACAACAATTGAACAAATAACGTCGAATATTACCAATGTTTATTTCGTTATC6TAGTG 



Mitr uOLLLLTCLLQLIUVTNKAIAS 
p T . M E c N C C C • L V Y C S L • W L Q I K Q • H H 
\ \ K \ S ^ A S ''v V. H \ F ^ A. A Y N G Y K • S H S I T 

I I I 1 1 1 1 \ 1 1 1 ' ' ' * ' ^ 

F M F H I C H„ H V Q K N C S I I T V F L A I A D C 

V Y F S H L Q Q 9 • ^. ^ I P L Y L L L 11 V 

PCLIFAITTTLKNIAA - LP LYLLLMV 



Xba 1 

AA£rTTCACAAATAAAO^^ 

TTTAAAGTGTTTATTTCGTAAAAAAAGTGACGTAAGATCAACACCAAACAGGTTTGAGTAGTTACATAGAATAGTACAGA 

\ 'f % \ s ^ % ^ 'a 'f ^ \ \ % s \ \ \ \ \ \ w \ 

\ % "t \ K \ 'f ^/s \ \ S S. C > L S K L I M V S Y H V 

' , Z r K K • Q U RTTTQGFEDIYRIUO^ 

I M ^. I Y L U K K V A N • N H H T W V • • H I K D H B 
S "k V S \ '•a ^ S C E L Q P K D L S M L T D - T • ^ 

I Nsi 1 

agatcttgtggaatgtgtgtcagttaggctgtggaaagtccccaggctccccagcaggcagaagtatgcaaagcatgcat ^^^^ 
tctagaacaccttacacacagtcaatcccacacctttcaggggtccgaggcgtcgtccgtcttcatacctttcgtacota 

pqrrucVS • GVESPQAP<)QA^EVCKA,CI 

\\\\\\\\ \\ \ \ > % > % \ 's/. > 's 'm 

I I I 1 I 1 I I I ■ I I I I I ' ' ' 

I no P IH T L • P T S L G W A G W C A S T H L^A^HU 
<i R T q H T D T L T H F T G L S G L L C F Y A F C , A^ D 
I K H F T H °. N P H P F D G P E G A P L L I C LUC 
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h I 
si I 



CTCAATTACTCAGCAACCAGCTGTCGAAAGTCCCCAGGCTCCC CAGCAGGCA CAAGTATCCAAAGCATGCATCTCAATTA , , „^ 

1 1 1 ! 1 \ 1 ♦ i 1 i 1 1 ! 1 h 1120 

GAGTTAATCAGTCGTTGGTCCACACCTTTCAGGGGTCCGAGGGGTCGTCCGTCTTCATACGTTTCGTACGTAGAGTTAAT 

SISQQPGYESPQAPQQA^EVCKACIS I 
S QL V S NQVWKVPRLPSRQKYAKHASQL 
L N • SAT RCGKSPGSPAGRSMQSMHLN 



I 1 1 H 



H 1 1 1 H 



H 1 H 



H h 



E I L • CGPTS LGWAGWC A S^THLAHME I L 
• N T L V W T H F T G L G L L C F Y A^ F A * K 

RL • DALLHPFDGPEGAPLL I CLMCRL • 



Nco I 
pty I 



GTCAGCAACCATAGTCCCGCCCCTAACTCCGCCCATCCCGCCCCTAACTCCGCCCAGTTCCGCCCATTCTCCGCCCCATG 



H 1 1 1 i 1 1 1 — — ^ 1 1 1 ^ 



\ 1- 1200 



CAGTCGTTGGTATCAGGGCGGGGATTGAGGCGGGTAGGGCGGGGATTGAGGCCCGTCAAGGCGGGTAAGAGGCGGGGTAC 

SQQP SRP • LRPSRP - L^RP^VPPILRPy 
V S N HSPAPNSAHPAPNSAQFRPPSAPW 
S A T I V PPLTPPIPPLTPPSSAHSPPH 

I 1 1 1 1 1 1 1 i 1 1 1 1 1 1 1 h 

• CGYDRG SRGDRG - SRGTGGMRRGM 

TLVWLGAGLEAWGAGLEAWNRGNEAGH 
D ALM TCGRVGGMCGRVGGLEAWEGGWP 

Hae m Hae fflP^Slae m 

GCTGACTAATTTTTTTTATTTATGCAGAGGCCGAGGCCCCCT CGGCCTCTGAGCTATTC CAGAAGTAGTGAGGAGGCTTT ^ 

I I 1 1 ! 1 I h- 1 1 I 1 11! h 1280 

CGACTGATTAAAAAAAATAAATACCTCTCCGGCTCCGGCGGACCCGGAGACTCGATAAGGTCTTCATCACTCCTCCGAAA 

AD FFLFMQRP RPPRPLSYSRSSEEAF 
L T M F F Y L C R G R G R L G L A 1 P^ E V V R R L 
GLIFFIY A E A E A A S A S E L F Q K • G G F 



I 



H h 



H 1 1 1 1 h 



H 1 h 



AS • HKKN I CLGLGGRGR^L • ELLLSSAK 
s/v L K K - K H L P R^ P^ R^ R„ P^ R^ A I G S T T L L S K 
QSIKKI ASASAAEABSSNWFYHPPK 

Hae m 
3tu I 
Avr n 
ty I 

srfcCTAGCCTTTTGCAAAAAGCTCCCT'CGAGAGCTTGCCGTAATCATGGTCATAGCTGTTTCCTCTGTGAAATT^ 



TTTGGA 



Ava I 
Xho I 



H 1 1 1 1 1 1 1 1 1 1 1 H360 



AAACCTCCGGATCCGAAAACGTTTTTCGAGGGAGCTCTCCAACCGCATTAGTACCAGTATCCACAAAGGACACACTTTAA 

LEA AFAKSSLESLA 'SWS - LPPV H 
F W R PR L L Q K A PS RAWRMHGHS C F LCE I 
F G G L G F C K K L P R E L C V I M V I A V S C V K L 

I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1- 

KSA AKAFLERSLRAYDHDYSNGTHFQ 
Q L CL S K CFAGELA QRL • P • LQKRHS I 
K P P R P K Q L F S G R S S P T I M T M A T E Q T F N 



RMjtmnnrv *wo ooo6S97A3 ia> 
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rb I 



GTTATCCGCTCACAATTCCACACAACATACGAGCCGGAACCATAAAGTGTAAAGCCTGGGGTCCCTAATGAGTGAGCTAA 

1 1 \ 1 1 1 1 1 1 1 1 1 ! 1 1 h 1440 

CAATAGGCGAGTGTTAAGGTCTGTTGTATGCTCGGCCTTCGTATTTCACATTTDGGACCCCACGGATTACTCACTCGATT 

C Y P L T 1 P H N I R A G S I C„ K A^ G^ A • • V S • 

VI RSQPHTTYEPEA S^VKPG^VPNE • AN 

LSAHKSTQHTSRKHKV • SLGCLIISEL 

I 1 1 1 1 1 1 1 1 1 \ 1 1 1 1 1 \ 

' GSVIG CLURAPLUFHLAQPA HTL • 

T IRECNWVVYSGSAYLTFGPTGLSHA I 
NDA • LEVCCVLRFCLTYLRPHRILSSV 

Asel Pvji n |Asel Hae m 

CTCACATTAATTGCGTTGCGCTCACTGCCCGCTTTCCAGTCGGGAAACCTGTCGTGCCAGCTGCATTAATGAATCGGCCA .^^^ 
1 1 1 \ 1 1 1 1 1 1 1 1 1 1 i K 1520 

GAGTGTAATTAACGCAACGCCAGTGACGGGCGAAAGGTCAGCCCTTTGGACAGCACGGTCGACGTAATTACTTAGCCGGT 

L T L I A L R S L P A F Q G„ C H I G Q 

SH • LRCAHCPLSSRET^C^R^A^SC INESA 
THl NCVALTARFPVGKPVVPAALMNRP 

I 1 1 1 1 1 1 1 1 1 i 1 1 1 1 1 h 

SVN I ANRESCAKWDPFR^D^HWS^C • H I PW 
EC - KRQA QGSELRSVQRALQMLSDAL 
•ULQTASVARKGTPFGTTGAANI FRG 



Sap I 



ACGCGCGGGGAGAGGCGGTTTGCGTATTGGGCCCTCTTCCGCTTCCTCGCTCACTGACTCGCTGCGCTCGGTCGTTCGGC 

1 1 1 1 1 1 1 1 1 i 1 I 1 1 1 1- 1600 

TGCGCGCCCCTCTCCGCCAAACGCATAACCCGCGAGAAGGCGAAGGAGCGAGTGACTGAGCGACGCGAGCCAGCAAGCCG 

RAGRGGLRIGRSSASSLTDSLRSVVR 
N A R G E A V C V L G A L„ P^ L„ P, R^ L R^ A, R^ F^ G^ 

TRGERRFAYWALFRFLAH • LAAL GRSA 

I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1- 

R A P L P P K R 1 P R E E A E E« V S E S R E^ T R^ S 
ARPSATQTNPARGSGRES^VRQARDNP 
V RP S L RN AYQASKRKRA • QSAASPREA 



BsrB I 

TGCCGCGAGCGGTATCAGCTCACTCAAAGCCCCTAATACCGTTATCCACACAATCAGCGGATAACGCAGGAAAGAACATG 

I I 1 1 1 I 1 1 1 1 1 1 1 1 1 1 1- 1680 

ACGCCGCTCGCCATAGTCGAGTGAGTTTCCGCCATTATGCCAATAGGTGTCTTACTCCCCTATTGCGTCCTTTCTTGTAC 

LRRAVSAHSKAVIRLST^ES^GDNA^GKNU 
C6ERYQLTQRR YCYPQNQCITQERTC 
AASGI SSLKGGNTVIHRIRG RRKEH 

I 1 1 i 1 1 1 1 1 1 1 1 1 1 1 1 ^ 

RRATDA • EFATIRKDVSDPSLAPPFM 
Q P S R Y • S V • L R Y Y P • G C F • P I V C S L V H 
A AL P I LESLPPLVT I WL I L PYRLFSCT 
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pae in pae m pie ffl 



TGAGCAAAAGGCCAGCAAAACGCCAGGAACCGTAAAAAGGCCGCGTTGCTGGCGTTTTTCCATAGGCTCCGCCCCCCTGA 

1 ) 1 1 1 I I H- ! 1 1 1 1 1 1 h 1760 

ACTCGTTTTCCGGTCGTTTTCCGGTCCTTGGCATTTTTCCGGCGCAACGACCGCAAAAAGGTATCCGAGGCGGG6GGACT 

• AKGQQKARNRKKAALLAFPHRLRPPO 
E Q K A S KRPGTVKRPRCWRPS IGSAPL 
V S K R P A KGQEP • KGRVAGVPP • APPP • 

I 1 1 1 1 1 1 1 1 I 1 1 1 1 1 1 h 

HAPPWCFALFRLFAAN^S^A^N^KWLSRGGS 
SCF ALLLGPVTFLGRQQ^R. KEMPEAGRV 
LLL G A PPWSGYFPRTAPTKGYAGGCQ 

CGAGCATCACAAAAATCGACGCTCAAGTCACAGGTGGCGAAA CCCGACAGGACTATAAAG ATACCAGGCGTTTCCCCCTG . ^ ^ ^ 

1 1 1 1 I 1 1 i 1 1 1 1 1 I — I *• 1840 

GCTCGTAGTGTTTTTAGCTGCGAGTTCAGTCTCCACCGCTTTGGGCTGTCCTGATATTTCTATGGTCCGCAAAGGGGGAC 

E H H K N R R S„ R^ R^ P^ T G L • R <J^ A^ F^ P^ P, 

TS I TK I DAQVRGGETRQDYKDTRRF P L 
RASQKSTL KSEVAKPDRTIKIPGVSPW 

I 1 1 1 1 1 1 1 1 1 1 1 1 ! 1 1 h 

SC< LFRREL LHRFGVPSYLYWAMCGP 
LUVPISA TLPPSVRCS' LSVLRKGR 

RADCFDVSLDSTAFGSLVI PIGPTEGQ 

GAAGCTCCCTCGTGCGCTCTCCTGTTCCGACCCTGCCGCTTACCGGATACCTGTCCGCCTTTCTCCCTTCGGGAAGCGTG 

1 I 1 1 ! 1 1 1 I I 1 1 1 1 1 V 1920 

CTTCGAGGGAGCACGCGAGAGGACAAGGGTGGGACGGCGAATGGCCTATGGACAGGCGGAAAGAGGGAAGCCCTTCGCAC 

GSSLVRSPVPTLPLTGYLS^APLPSGSV 
E APS C A L L P R P^ C R L„ P„ D C P^ P^ F^ R^ E^ A„ W 

KLPRALSCSDPAAYRIPVRLSPFGKR 

I 1 1 1 1 1 1 "+ 1 1 1 1 1 1 1 1 y 

LERTREGTGVRGSVPYR^DAKRGEPLT 
SA GEHARRNRGQRKGSVQGGKERRSAH 
F S G R A S EQESCAA RIGTRREGK PFRP 

ApaL I 

gcgctttctcaatgctcacgctgtaggtatctcagtt cggtgtaggtcgttcgctccaagctgggctgtgWcacgaacc 

I I 1 1 I I 1 I I 1 1 1 I I I K 2000 

cgcgaaagagttacgagtgcgacatccatagagtcaagccacatccagcaagcgaggttcgacccgacacacgtgcttgg 
alsqcsrcrylssv • vvrsklgcvhep 

R F L N A H A V C„ I R^ C R^ F^ A^ P^ A, K 

GAFSMLTL ' vsqfgvgrslqaglcart 
I 1 1 1 1 1 I 1 1 1 1 1 1 1 1 1 1- 

a S E • H E R Q L Y R„ L E„ T ^ T T R E L S P Q T C S G 
R K R L A • A T P^ E T R„ H L D H A G L Q A T H V P G 
AKE I SVSYTD - NPTPRESYAPSHARV 



Nci 



I 



ccccgttcagcccgaccgctgccccttatccggtaactatcgtcttgagtccaacccggtaagacacgacttatccccac ^^^^ 

I I I I I I I I ♦ 1 1 I I i I 1 I 2080 

ggggcaagtcgggctggcgacgcggaataggccattgatagcagaactcaggttgggccattctgtgctgaatagcggtg 

pvqpdrcalsgnyrleshpvrhdlsp 
pppsptaapypvtivlsp^tr- dttyrh 

P R S A R FLRL IR • LSS • VQPGKTRL I AT 
( 1 \ 1 1 1 1 1 1 1 ! 1 » 1 1 1 h 

GT • GSRQAKDPL RRSDLGTLCSKDGS 
GNL G Y A AG GTVITK^LGVRYSVV • RW 
GREARGSRR I RYSDDQTWGPLVRS I AY 
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e m 



TGGCACCAGCCACTCGTAACAGGATTACCAGACCGACCTATGTAGGCCGTGCTACAGAGTTCTTGAAGTGGTGGCCTAAC ^ 

1 I 1 1 1 i \ 1 1 1 1 1 1 1 1 e 2160 

ACCGTCGTCGGTGACCATTGTCCTAATCGTCTCGCTCCATACATCCGCCACGATGTCTCAAGAACTTCACCACCGGATTG 

LAAATGNR I SRARYVGG^AT^EFLKWWPN 
WQQPLVTGLAERGU* AVLQSS • SGGLT 
GSSHW ' QD • QSEVCRRCYRVLEVV A - 

I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ^ 

^ A A A V P L L I L L A^ L Y P P^ A^ V S K F H H G L 

Q C C G T V P N A S R^ P. I Y A T S C L E^ L P P R V 

PLLWQYCS • CLSTHLRH • LTRSTTA • S 

TACGGCTACACTAGAAGGACAGTATTTGGTATCTGCGCTCTGCTG AAGCCAGTTACCTT CGGAAAAAGAGTTGGTAGCTC 
1 1 1 1 ! 1 ! 1 i 1 1 ! -+ 1 1 h 2240 

ATGCCGATGTGATCTTCCTGTCATAAACCATAGACGCGAGACGACTTCGGTCAATGGAAGCCTTTTTCTCAACCATCGAG 

YGYTRRTVFGI CA LLKP^VTFGKRVGSS 
TATLEGQYLVSALC - SQ^LPSEKELVA 
LRLH 'KDSIWYLRSAEASYLRKKSW - L 

I 1 1 ! 1 1 1 1 1 1 1 J 1 1 1 1 1- 

• P • VLLVTNP IQAR^S^FGTVKPFLTPLE 
VAVSSPCYKTDASQQLWNG^ESFSNTAR 
RSC - FSL IQYRREASAL RRFFLQYS 

TTGATCCGGCAAACAAACCACCGCTGGTAGCGGTGGTTTTTTTGTTTGCAAGCAGCAGATTACGCGCAGAAAAAAAGGAT ^^^^ 

1 1 1 1 1 1 1 \ 1 1 1 1 1 1 1 1- 2320 

AACTAGGCCGTTTGTTTGGTGGCCACCATCGCCACCAAAAAAACAAACGTTCGTCGTCTAATGCGCGTCTTTTTTTCCTA 

• S G K Q T T A G S C G„ F^ F C K I T R R K K G 

L D P A N K P P L V A F^ L F A^ S S R^ L R E K K D 

LIRQTNHRW RWFFCLQAADYAQKKRI 

I 1 1 I ! 1 1 1 1 1 1 1 1 1 1 ! ¥ 

QDPLCVVAPLP PKKTQLCC I VRLFFPD 
SCAFLCGS^T ATTKKNA^LLL^NRASFFS 
KIRCVFWRQYRHNKQKCAAS • AC FFLI 



BspH I 



CTCAAGAAGATCCTTTCATCTTTTCTACGGGGTCTGACCCTCAGTGGAACGAAAACTCACCTTAAGGGATTTTGGTCATG 

1 1 I I 1 1 1 1 1 1 I 1 1 J 1 I 2400 

GAGTTCTTCTAGGAAACTAGAAAAGATGCCCCACACTGCGAGTCACCTTGCTTTTGAGTGCAATTCCCTAAAACCAGTAC 

SQEDPL I PSTGSDAQ^W^NE^N^S^R • G I LVII 
LKK I L • SFLRGLTLSGTKTHVKGFWS • 
SRR S FDLFYGV RSVERKLTLRDFGH 

I 1 1 1 1 1 1 I 1 1 1 1 1 1 1 1 ¥ 

• SSGKIKEVPDSA HFSFER • PIKTU 
RLFIRQQKRRPRVSLPVPV • TLPNQDH 
ELLOKSRK • PTQRETSRFSVKLSKP • S 



jDra I 



Dra I 



AGATTATCAAAAAGGATCTTCACCTAGATCCTTTTAAATTAAAAATGAAGTTTTAAATCAATCTAAAGTATATATGACTA^,^^ 
( 1 1 1 1 1 1 1 I 1 1 1 i 1 1 1 1-2480 

TCTAATAGTTTTTCCTAGAAGTGGATCTAGGAAAATTTAATTTTTACTTCAAAATTTAGTTACATTTCATATATACTCAT 

RLSKRIPT* ILLN K SFKSI SIYE- 
DYQKGSSPRSF - IKNEVLNQSKVYUS 
EIIKKDLHLDPFKLKIfKF - INLKYI V 

I 1 1 1 1 1 1 1 1 1 1 1 1 i 1 1 h 

LNDFL I KV • I RKF F^HLKL^D I • L I YSY 
S • • FPDEGLDK - I. LFS^TKP • DLTYI LL 
IILFSR RSGKLNFIFN - ILRFYIHT 



SUBSTITUTE SHEET (RULE 26) 



WO00A)6597 • PGTAJS99/I7425 



30/34 



AACTTGCTCTGACAGTTACCAATGCTTAATCAGTGACGCACCTATCTCAGCGATCTGTCTATTTCGTTCATCCATAGTTG ^^^^ 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 e 2560 

TTGAACCAGACTGTCAATGGTTACGAATTAGTCACTCCGTGGATAGAGTCGCTAGACAGATAAAGCAAGTAGGTATCAAC 

TWSDSYQCLISEAPIS^AI CLFRSS I V 
KLGLTVTNA SVRHLSQRSVYFVHP • L 
NLV QLPMLNQ GTYLSOLSISFIHSC 

I 1 1 1 1 1 1 \ 1 1 1 * 1 1 1 1 1- 

VQDSL • WHK I LSAG I EA^IQ^RNR^EDMTA 
S PRVTVLA • DTLCRD • RDT • KT • GYN 
F K T QCNG I SL • HPV • RLSRD I ENMWLQ 



ae m 



CCTGACTCCCCCTCGTGTAGATAACTACGATACGGGAGGGCTTACCATCTGGCCCCAGTGCTGCAATGATACCGCGAGAC ^ ^ 

1 1 1 1 1 — ■ 1 1 1 1 i 1 1 1 1 1 h 2640 

CGACTGAGGGGCAGCACATCTATTGATGCTATGCCCTCCCGAATGGTAGACCGGGGTCACGACGTTACTATGGCGCTCTG 

A LPVV - ITTIREGLPSGPSAA^MIPRD 

PD SPSCR • LRYGRAYHLAP^VLQ - YRET 
LTPRRVDNYDTGGLT IWPQCCNDTAR 

I 1 ! \ 1 1 1 1 1 1 1 i 1 1 1 1 h 

QSGT TYIVVIRSPKGDPGLAAI IGRS 
G S EGDHLYSRYPLA • WR^AGTS^CHYRSV 
RVGRRTSL • SVPPSVMQGWHQLSVALG 



CCACGCTCACCGGCTCCAGATTTATCAGCAATAAACCAGCCAGCCGGAAGGGCCGAGCGCAGAAGTGGTCCTGCAACTTT ^^^^ 
H 1 1 I 1 1 1 I i 1 1 1 I I 1 ^ 2720 



GCTGCGAGTGGCCGAGCTCTAAATAGTCGTTATTTGGTCGGTCGGCCTTCCCGCCTCCCGTCTTCACCAGGACGTTGAAA 

PRSPAPDLSAINQPAGRAERRSGPATL 
H A H R L Q I Y Q Q • T S Q P E G P S A^ E V V L Q L 
P T L T G S RF I SNKPASRKGRAQKWSCNP 

I 1 1 1 1 1 1 1 1 1 1 » 1 1 1 ! y 

GREGAGSKDAI FWGAPLASRLLPGAVK 
W A • R S W I • C Y V L W G S P G L A S T T R^ C S 

V SV PELN I LLLCALRFPRACFHOQLK 

Asel Nci I ' Fsp I 

ATCCGCCTCCATCCAGTCTATTAATTGTTGCCGGGAAGCTAGAGTAACTAGTTCGCCAGTTAATAGTTTGCGCAACGTTG ^^^^ 

1 I 1 i 1 1 I I 1 1 1 1 H 1 1 h 2800 

TACGCCGAGCTAGGTCAGATAATTAACAACCGCCCTTCCATCTCATTCATCAAGCGGTCAATTATCAAACGCGTTGCAAC 

SAS IQS INCCREARVSSSP^VNSLRNV 
Y P P P S S L L I V A G K L E • V R Q L I V C A^ T 
IRLHPVY • LLPGS • SK • FAS • • FAQRC 

I ! 1 1 1 \ 1 1 1 ^ 1 1 1 1 1 1 K 

D A E U W D I L q Q A^ T L E^ G T K R^ L T T 

GCGDLRNITAPFSSYTTRWNITQAVK^ 
IRRWCT- NNGPL • LLYNAL • YKACRQ 

TTGCCATTGCTACAGGCATCGTGGTGTCACGCTCGTCGTTTGGTA^ 

AACGGTAACGATGTCCGTAGCACCACAGTGCGAGCAGCAAACCATACCGAAGTAAGTCGAGGCCAAGGGTTGCTAGTTCC 

VAIATGIVVSRSSFGtfASPSSGSQRS^R 
L P L L Q A SWCHARRLVWLHS^APVPNDQG 
C H C Y R HRGVTLVVWYGF I QLRFPT I K 

I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ^ 

A U A V P U T T D R E« D P. !„ A^ E L E^ P^ E W R D 
HGNSCADHH ARRKTHS EAGTGLS - P 

Q W Q • L C R P T V S T T « Y P K M S R N G V I LA 
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jAva n |Pvu I Hae ffl 

CGAGTTACATGATCCCCCATGTTGTGCAAAAAAGCGCTTAGCTCCTTCGGTCCTCCGATCGTTGTCAGAAGTAACTTGGC 
1 1 1 1 1 i 1 1 1 1 1 1 1 I 1 h 2960 

GCTCAATGTACTAGGGGGTACAACACGTTTTTTCGCCAATCGAGGAAGCCAGGAGGCTAGCAACAGTCTTCATTCAACCG 

RVT - SPMLCKKAVSSFGPPIVVRSKLA 
ELHDPPCCAKKRLAPSVLRSLSEVSW 
ASYyiPHVVQKSG - LLRSSDRCQK - VG 
I 1 1 1 1 1 1 1 1 1 1 1 1 » 1 1 1- 

RTVHDGUNHLFATLEKPGGITTLLLNA 
SNCSGGHQAPFRNAGETRRDKDSTLQ G 
L MIGWTTCFLP SRRDESRQ • FYT P 

CGCAGTGTTATCACTCATGGTTATGGCAGCACTGCATAATTCTCTTACTGTCATGCCATCCGTAAGATGCTTTTCTGTGA 
1 1 1 1 I 1 1 1 I 1 1 1 1 1 1 h 3040 

GCGTCACAATAGTGAGTACCAATACCGTCGTGACGTATTAAGAGAATGACAGTACGGTAGGCATTCTACGAAAAGACACT 

AVLSLUVUAALHNSLTVUPSVRCFSV 
PQCYHSWLWQHCI ILLLSCHP - DAFL • 
RSVI THGYGSTA • FSYCHAIRKMLFCD 

I 1 1 1 1 1 ! 1 1 1 \ 1 1 1 1 1 »• 

A T N D S U T I A A C L R, G_ D T L H K E T V 

C H • • E H N H C C Q y I R„ K D H G^ Y S A K R^ H 
RLTIV P PLVAYNE Q AMRLISKQS 

Rsa I 

Sea I Nci I Hinc H 

CTGGTGAG AcTCAACCAAGTCATTCTGAGAATAGTGTATGCGGCGACCGAGTTGCTCTTGCCCGGCGTCAACACGGGAT ^ . „^ 

1 1 1 1 1 1 1 ! 1 * 1 — • — I 1 1 1 1- 3120 

GACCACTCATGAGTTGGTTCAGTAAGACTCTTATCACATACGCCGCTGGCTCAACGAGAACGGGCCGCAGTTGTGCCCTA 

TGEYSTKSP - E - CMRRPSCSCPASTRD 
LVSTQPSHSENSVCGDRVALARR4HGI 
W • VL NQV I LR I VY.AATELLLPGVNTG 

I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 h 

PSYEVLDNQSYHIRRGLQEQGADVRS 
STLV GL ESFLTHPSRTARARR CPI 
QHTSLWTMRLITYAAVSNSKGPTLVPY 

pra I |Xixm I 

AATACCGCGCCACATAGCAGAACTTTAAAAGTGCTCATCAT TGGAAAACGTTCTTCGGGGC CAAAACTCTCAAGGATCTT^^^^ 

I 1 I I ! 1 1 1 1 1 1 i 1 1 ! 1 1- 3200 

TTATCGCGCGGTGTATCGTCTTGAAATTTTCACGAGTAGTAACCTTTTGCAAGAAGCCCCGCTTTTGAGAGTTCCTAGAA 

NTAPHSRTLKVLI IGKRSSGRKLSRIL 

IPRHIAEL KCSSLENVLRGENSQGS 
•YRAT ONPKSAHHWKTFFGAKTLKDL 

I— I 1 1 1 1 1 1 1 1 1 * 1 1 1 1 h 

LVAC CLLVKFTSMIIPF^R^EEPRFSEL I K 
IGRWUASS FHEDHSFTRRPSPE -PD* 

YRAVYCFKLLA- QFVNKPAFVRLSR 
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^pal I 

ACCGCTGTTGAGATCCAGTTCGATGTAACCCACTCGTGCACCCAACTGATCTTCAGCATCTTTTACTTTCACCAGCGTTT 

1 1 1 \ 1 1 ! 1 1 i 1 1 1 ! 1 ¥ 3280 

TGGCGACAACTCTAGGTCAAGCTACATTGGGTGAGCACGTCGGTTGACTAGAAGTCGTAGAAAATGAAAGTGGTCGCAAA 

PLLRSSSM - PTRAPN • SSASFTFTSV 
YRC • DPVRCNPLVHPTDLQHLLLSPAF 
T A VE I QFDVTHSCTQL I FS I FYPHQRP 

I 1 1 1 ! 1 1 1 1 1 1 ! 1 f 1 1 1- 

GSNLDLEIYGVRAGLQDEADKVK VLTE 
RQQSGTRHLGSTCGVSR CRKSEGAN 
V A T S I W NSTVWEHVWS IKLUK • K • WRK 

CTGGGTGAGCAAAAACAGGAAGGCAAAATGCCGCAAAAAAGGGAATAAGGGCGACACGGAAATGTTGAATACTCATACTC ^^^^ 

1 1 1 I 1 1 1 + i 1 1 1 1 I I ¥ 3360 

GACCCACTCGTTTTTGTCCTTCCGTTTTACGGCGTTTTTTCCCTTATTCCCGCTGTGCCTTTACAACTTATGAGTATGAG 

SG AKTGRQNAAKKGIRATRKC - ILIL 

LGEQKOEGKtfPQKRE * GRHGNVEYSYS 
WVS KN RKAKCRKKGNKGDTEMLNTHT 

I 1 1 1 i 1 1 1 1 1 1 1 1 1 1 h 

PHAFVPLCFAAFFPILAVRFHQISMS 
RPS C FCSPL IGCFLSYPRCPFTSYEYE 
Q T L L F L F A F H R L F P F L P S V S I N F V • V R 

Hinc n Spe I Asel 

TTCCTTTTTCAATATTATTGAAGCATTTATCAGGGTTATTGTCTCATGCGCGTTGACATTGATTATTGACTAGTTATTAA ^ ^ ^ „ 

1 1 1 1 1 I 1 I I i I I 1 I I I 3440 

AAGGAAAAAGTTATAATAACTTCGTAAATAGTCCCAATAACAGAGTACGCGCAACTGTAACTAATAACTGATCAATAATT 

FLFQYY SIYQGYCLyRVDIDY - LVIN 
SFFNI lEAFIRVIVSCALTLI ID - LL 
L P F S I LLKH LSGLYSHAR • H • LLTSY • 

I 1 1 1 1 -H 1 1 1 1 1 1 1 1 i 1 y 

KRK Y QLM' P QRMRTSUS QSTIL 

E K K L I I S A N I L T I T E H A V I I S • K N I 

G K E I M KFCKDPNND ARQCQNNVL- • 



e m 

B^l I 



r 



TAGTAATCAATTACGGGGTCATTAGTTCATAGCCCATATATGGAGTTCCGCGTTACATAACTTACGGTAAATGGCCCGCC ^^^^ 

I I 1 ( 1 1 1 I I 1 1 1 I I 1 I !■ 3520 

ATCATTAGTTAATGCCCCAGTAATCAAGTATCGGGTATATACCTCAAGGCGCAATGTATTGAATGCCATTTACCGGGCGG 

SNQLRGH - FIAHIWSSALHNLR MAR 
I V I K Y G V I S S • P I Y G V P R Y I T Y G K W P A 
• S I T GSLVHSPYMEFRVT • LTVNCP P 

I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1- 

LL NRP' NHAWIHLEANCLKRYIARR 
T I L P TIILEYGMYPTGR • MV • PLHGA 

Y Y D IV P DKT • LGYISKRTVYSVTFPGG 
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t n 



TGGCTGACCGCCCAACGACCCCCGCCCATTGACGTCAATAATGACGTATGTTCCCATAGTAACGCCAATAGGGACTTTCC 
1 1 \ 1 \ 1 1 ! 1 1 1 1 1 1 1 1- 3600 

ACCGACTGGCGGGTTGCTGGGGCCGGGTAACTGCAGTTATTACTGCATACAAGGGTATCATTCCGGTTATCCCTGAAAGG 

LADBPTTPAH RQ- RUFP- RQ -GLS 

WLTAQRPPP IDVNNDVCSHSNANRDPP 
G PPNDPRPLTSIMTYVPIVTPIGTF 

I 1 \ 1 1 1 1 1 1 1 1 1 1 1 1 1 h 

ASRGVVCAWQR YHRINGYYRWYPSE 
QSVAWRGGGySTLLSTHEWLLALLSKG 
PQGGLSGRGNVDIIVYTGMTVGIPVKW 

|4at n pgl I psa I ^de I pa I 

ATTGACGTCAATGGGTGGACTATTTACGGTAAACTGCCCACTTGGCAGTACATCAAGTGTATCATATGCCAAGTACGCCC 

1 ! 1 1 1 1 1 1 1 1 1 1 1 1 1 h 3680 

TAACTGCAGTTACCCACCTGATAAATGCCATTTGACGGGTGAACCGTCATGTAGTTCACATAGTATACGGTTCATGCGGG 

IDVNGWTIYGKLPTWQYIKCI ICQVRP 
LTSMGGLFTVNCPLGSTS^SVSYAKYA 
H • RQWVDY LR • TAHLAVHQ VYHyPSTP 

I 1 1 1 I 1 1 1 1 1 1 1 1 1 1 1 h 

USTLPHVI PLSGVQCYHLHIUHWTRG 
NVDIPPSNVTFQGSPLVDLTDYALYAG 
QR HTS KRYVAWKATC TY - IGLVG 



Hae m 



Mi n Pgl I Psa I 



CCTATTGACGTCAATGACGGTAAATGGCCCGCCTGGCATTATGCCCAGTACATGACCTTATGGGACTTTCCTACTTGGCA 

1 1 1 1 \ I 1 1 1 1 1 1 I 1 1 h 3760 

CGATAACTGCAGTTACTGCCATTTACCGGGCGGACCGTAATACGGGTCATGTACTGGAATACCCTGAAAGGATGAACCGT 

LLTSMTVNGPPGIMPST • PYGTFLLG 
PY • RQ * R - UARLALCP^VHD LUGLSYLA 
PIDV NDGKWPAWH YAQYMTLWDFPTWQ 

I 1 1 1 1 1 1 1 1 1 1 1 1 J 1 1 1- 

RHVDIVTFPGGPICIGLVHG • PVKRSPL 
•QR HRYIARR^ANHGTCSRIPSE KA 
GI STLSPLHGAQC - AWYMVKHSKGVQC 



BsaA I Nco I 

psa I pnaB I pty I pa I 



GTACATCTACGTATTAGTCATCGCTATTACCATGGTGATGCCCTTTTGGCAGTACATCAATGGGCGTGGATAGCGGTTTC , ^ 
I 1 1 1 1 1 1 1 1 I 1 1 1 1 1 1 h 3840 

CATGTAGATGCATAATCAGTAGCGATAATGGTACCACTACGCCAAAACCGTCATGTAGTTACCCGCACCTATCGCCAAAC 

STSTY* SSLLPW CGPGSTSUGVDSGL 
VHLR I SHRYYHGDAVLA^VHQWAWI AV • 
YIYVLVIAITUVMRFVQYINGRG - RF 

I 1 I 1 1 1 1 1 1 1 1 1 1 1 1 1 h 

VDVY DDSNGHHHPKPLVDtPTSLPK 
TCRRIL R- WPSATK^ATC HAHIATQ 

Ylf - TNTIIAIVUTIRNQC YMLPRPYRHS 
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|Aat n 

ACTCACGGGGATTTCCAAGTCTCCACCCCATTGACGTCAATGGGAGTTTGTTTTGGCACCAAAATCAACGGGACTTTCCA 

1 1 1 1 1 1 I 1 1 1 1 1 1 1 1 h 3920 

TGAGTGCCCCTAAAGGTTCAGAGCTGGGGTAACTGCACTTACCCTCAAACAAAACCGTGGTTTTAGTTGCCCTGAAAGGT 

THGDPQ V S T PLTS^UGV^C^FG T^K I NOT FQ 
L T G I S K S P P H • R « E„ P V L A P K T G L S 

DSRGFPS LHPIDVNGSLFWHQNQRDFP 

I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 h 

V • PSKWTEVGNVDIPTQKPVL I LPVKW 
SVPIELDGGWQR - HSNTK^AGFDVPSEL 
ERPNGLRWGUSTLPLKN4CWF • RSKG 



psa I pac I 



AAATCTCGTAACAACTCCGCCCCATTGACGCAAATGGGCGGTAGGCGTGTACGGTGGGAGGTCTATATAAGCAGAGCTCT 

! 1 1 1 1 1 1 1 1 i 1 1 1 1 i 1- 4000 

TTTACAGCATTGTTGAGGCGGGGTAACTGCGTTTACCCGCCATCCGCACATCCCACCCTCCAGATATATTCGTCTCGAGA 

NVVTTPPH • RKWAVGVYGGRSI AEL 
K US QLRP I DAKGR ACTVGGLYKQSS 
K C RN N S APLTQMGGRRVRWEVYISRAL 

I 1 1 1 1 1 1 1 1 1 f 1 i \ I 1 h 

FTTVVGGWQRLHATPTY^P^P^L^D^IYASSE 
IDYCSRGMSAFPRYAHVTPPRYLCLE 
P H R L L E AGNVCIPPLRTRHST- ILLAR 



Rsa I 



CTGGCTAACTAGAGAACCCACTGCTTAACTGGCTTATCGAAATTAATACGACTCACTATAGGGAGACCC ^^^^ 

— H 1 1 1 1 1 1 1 1 1 1 i ^4069 

GACGGATTGATCTCTTGGGTGACGAATTGACCGAATAGCTTTAATTATGCTGAGTGATATCCCTCTGGG 

SG' LENPLLNWLIEINTTHYRET 
L A N . R T H C L T G L K L I R^ L T I G R P 
WLTREPTA • LAYR^H YDSL GDP 

I \ 1 1 1 1 1 — —{ 1 1 1 1 1 1 ^ 

P SSFGSSLQSISILVV- LSVW 
r/a L L V W Q K V > K K I R^ V I P L 6^ 

QSVLSCV A- SA RF YSESYPSG 
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SEQUENCE LISTING 

(1) GENERAI- INFORMATION: 

(i) APPLICANT: Behan, Dominic P. 

Chalmers, Derek T, 
Liaw, Chen 
Lin, I -Lin 
Lowitz, Kevin P. 
Chen , Ruoping 

(ii) TITLE OF INVENTION: Endogenous, Constitutively Activated 

G Protein-Coupled Orphan Receptors 

(iii) NUMBER OF SEQUENCES: 45 

(iv) CORRESPONDENCE ADDRESS: 

'<A) ADDRESSEE: 

(B) STREET: 

(C) CITY: 

(D) STATE : 

(E) COUNTRY: 

(F) ZIP: 

(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS /MS-DOS 

(D) SOFTWARE: Patentin Release #1.0, Version #1,30 

(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: US 

(B) FILING DATE: 

(C) CL7VSSIFICATION: 

(viii) ATTORNEY /AGENT INFORMATION: 
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(A) NAME: Michael Straher 

(B) REGISTRATION NUMBER: 38,325 

(ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE : 

(B) TELEFAX: 

(2) INFORMATION FOR SEQ ID N0:1: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 1 : 
GGAGGATCCATGGCCTGGTTCTCAGC 2 6 



(3) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 29 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 
CACAAGCTTAGRCCRTCCMGRCARTTCCA 29 
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(4) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 

GGAGAAGCTTCTGGCGGCGATGAACGCTAG 3 0 

(5) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) TiKNGTH : 28 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 

ACAGGATCCAGGTGGCTGCTAGCAAGAG 28 

(6) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 
CTTAAGCTTAAAATGAACGAAGACCCGAAG 3 0 

(7) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

( D ) TOPOLOGY : 1 inear 

(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 
GGAGGATCCCCAGAGCATCACTAGCAT 27 

(8) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 530 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 
GGAGGATCCA TGGCCTGGTT CTCAGCCGGC TCAGGCAGTG TGAATGTGAG CATAGACCC^A 60 
GCAGAGGAAC CTACAGGCCC AGCTACACTG CTGCCCTCTC CCAGGGCCTG GGATGTGGTG 120 
CTGTGCATCT CAGGCACCCT GGTGTCCTGC GAGAATGCTC TGGTGATGGC CATCATTGTG 180 
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GGCACGCCTG CCTTCCGCGC CCCCATGTTC CTGCTGGTGG GCAGCTTGGC CGTAGCAGAC 24 0 

CTGCTGGCAG GCCTGGGCCT GGTCCTGCAC TTCGCTGCTG ACTTCTGTAT TGGCTCACCA 3 00 

GAGATGAGCT TGGTGCTGGT TGGCGTGCTA GCAACGGCCT TTACTGCCAG CATCGGCAGC 360 

CTGCTGGCCA TCACCGTTGA CCGCTACCTT TCCCTGTACA ACGCCCTCAC CTACTACTCA 420 

GAGACAACAG TAACTCGAAC CTACGTGATG CTGGCCTTGG TGTGGGTGGG TGCCCTGGGC 480 

CTGGGGCTGG TTCCCGTGCT GGCCTGGAAC TGCCGGGACG GTCTAAGCTT 530 
(9) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 601 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 

AAGCTTCTGG CGGCGATGAA CGCTAGCGCC GCCGCGCTCA ACGAGTCCCA GGTGGTGGCA 60 

GTAGCGGCCG AGGGAGCGGC AGCTGCGGCT ACAGCAGCAG GGACACCGGA CACCAGCGAA X20 

TGGGGACCTC CGGCAGCATC CGCGGCGCTG GGAGGCGGCG GAGGACCTAA CGGGTCACTG 180 

GAGCTGTCTT CGCAGCTGCC CGCAGGACCC TCAGGACTTC TGCTTTCGGC AGTGAATCCC 240 

TGGGATGTGC TGCTGTGCGT GTCGGGGACT GTGATCGCAG GCGAAAATGC 6CTGGTGGTG 300 

GCGCTCATCG CATCCACTCC CGCGCTGCGC ACGCCCATGT TTGTGCTCGT GGGTAGTCTG 360 

GCCACTGCTG ACCTGCTGGC GGGCTGTGGC CTCATCCTAC ACTTCGTGTT CCAGTACGTG 420 
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GTGCCCTCGG AGACTGTGAG CCTGCTCATG GTGGGCTTCC TGGTGGCGTC CTTCGCCGCC 4 80 

TCAGTCAGCA GCCTGCTCGC TATCACAGTG GACCGTTACC TGTCCCTTTA CAACGCGCTC 540 

ACCTACTACT CGCGCCGGAC CCTGTTGGGC GTGCACCTCT TGCTAGCAGC CACCTGGATC 600 

C 601 
(10) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 510 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

( D ) TOPOLOGY : 1 inear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 

AAGCTTAAAA TGAACGAAGA CCCGAAGGTC AATTTAAGCG GGCTGCCTCG GGACTGTATA 60 

GAAGCTGGTA CTCCGGAGAA CATCTCAGCC GCTGTCCCCT CCCAGGGCTC TGTTGTGGAG 120 

TCAGAACCCG AGCTCGTTGT CAACCCCTGG GACATTGTCT TGTGCAGCTC AGGAACCCTC 180 

ATCTGCTGTG AAAATGCCGT CGTGGTCCTT ATCATCTTCC ACAGCCCCAG CCTGCGAGCA 240 

CCCATGTTCC TGCTGATAGG CAGCCTGGCT CTTGCAGACC TGCTGGCTGG TCTGGGACTC 300 

ATCATCAATT TTGTTTTTGC CTACCTGCTT CAGTCAGAAG CCACCAAGCT GGTCACAATT 360 

GGACTCATTG TCGCCTCTTT CTCTGCCTCT GTCTGCAGTT TGCTGGCTAT CACTGTGGAC 420 

CGCTACCTCT CGCTGTATTA CGCCCTGACG TACCACTCCG AGAGGACCGT CACCTTTACC 480 
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TATGTCATGC TAGTGATGCT CTGGGGATCC 51 ( 

(11) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 
CTTT^GCTTGTGGCATTTGGTACT 24 

(12) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 28 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 

TCTGGATCCTTGGCCAGGCAGTGGAAGT 2 8 

(13) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 
GAGAATTCAC TCCTGAGCTC J^AGATGAACT 30 
(14) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGl^: 30 base pairs 

(B) TYPE : nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 
CGGGATCCCC GTAACTGAGC CACTTCAGAT 30 



(15) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1050 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 
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ATGAACTCCA CCTTGGATGG TAATCAGAGC AGCCACCCTT TTTGCCTCTT GGCATTTGGC 60 

TATTTGGAAA CTGTCAATTT TTGCCTTTTG GAAGTATTGA TTATTGTCTT TCTAACTGTA 120 

TTGATTATTT CTGGCAACAT CATTGTGATT TTTGTATTTC ACTGTGCACC TTTGTTGAAC 180 

CATCACACTA CAAGTTATTT TATCCAGACT ATGGCATATG CTGACCTTTT TGTTGGGGTG 240 

AGCTGCGTGG TCCCTTCTTT ATCACTCCTC CATCACCCCC TTCCAGTAGA GGAGTCCTTG 300 

ACTTGCCAGA TATTTGGTTT ' TGTAGTATCA GTTCTGAAGA GCGTCTCCAT GGCTTCTCTG 360 

GCCTGTATCA GCATTGATAG ATACATTGCC ATTACTAAAC CTTTAACCTA TAATACTCTG 420 

GTTACACCCT GGAGACTACG CCTGTGTATT TTCCTGATTT GGCTATACTC GACCCTGGTC 480 

TTCCTGCCTT CCTTTTTCCA CTGGGGCAAA CCTGGATATC ATGGAGATGT GTTTCAGTGG 54 0 

TGTGCGGAGT CCTGGCACAC CGACTCCTAC TTCACCCTGT TCATCGTGAT GATGTTATAT 600 

GCCCCAGCAG CCCTTATTGT CTGCTTCACC TATTTCAACA TCTTCCGCAT CTGCCAACAG 660 

CACACAAAGG ATATCAGCGA AAGGCAAGCC CGCTTCAGCA GCCAGAGTGG GGAGACTGGG 720 

GAAGTGCAGG CCTGTCCTGA TAAGCGCTAT GCCATGGTCC TGTTTCGAAT CACTAGTGTA 780 

TTTTACATCC TCTGGTTGCC ATATATCATC TACTTCTTGT TGGAAAGCTC CACTGGCCAC 840 

AGCAACCGCT TCGCATCCTT CTTGACCACC TGGCTTGCTA TTAGTAACAG TTTCTGCAAC 900 

TGTGTAATTT ATAGTCTCTC CAACAGTGTA TTCCAAAGAG GACTAAAGCG CCTCTCAGGG 960 

GCTATGTGTA CTTCTTGTGC AAGTCAGACT ACAGCCT^CG ACCCTTACAC AGTTAGAAGC 1020 

AAAGGCCCTC TTAATGGATG TCATATCTGA 1050 
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(16) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQOTNCE CHARACTERISTICS: 

(A) LENGTH: 349 amino acids 

(B) TYPE: amino acid 

( C ) STRANDEDNES S : 

(D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:15: 

Met Asn Ser Thr Leu Asp Gly Asn Gin Ser Ser His Pro Phe Cys Leu 
15 10 15 

Leu Ala Phe Gly Tyr Leu Glu Thr Val Asn Phe Cys Leu Leu Glu Val 
20 25 30 

Leu lie lie Val Phe Leu Thr Val Leu lie lie Ser Gly Asn He He 
35 40 45 

Val He Phe Val Phe His Cys Ala Pro Leu Leu Asn His His Thr Thr 
50 55 60 

Ser Tyr Phe He Gin Thr Met Ala Tyr Ala Asp Leu Phe Val Gly Val 
65 70 75 80 

Ser Cys Val Val Pro Ser Leu Ser Leu Leu His His Pro Leu Pro Val 
85 90 95 

Glu Glu Ser Leu Thr Cys Gin He Phe Gly Phe Val Val Ser Val Leu 
100 105 110 

Lys Ser Val Ser Met Ala Ser Leu Ala Cys He Ser He Asp Arg Tyr 
115 120 125 
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lie Ala lie Thr Lys Pro Leu Thr Tyr Asn Thr Leu Val Thr Pro Trp 
130 135 140 

Arg Leu Arg Leu Cys lie Phe Leu lie Trp Leu Tyr Ser Thr Leu Val 
145 150 155 160 

Phe Leu Pro Ser Phe Phe His Trp Gly Lys Pro Gly Tyr His Gly Asp 
165 170 175 

Val Phe Gin Trp Cys Ala Glu Ser Trp His Thr Asp Ser Tyr Phe Thr 
180 185 190 

Leu Phe lie Val Met Met Leu Tyr Ala Pro Ala Ala Leu lie Val Cys 
195 200 205 

Phe Thr Tyr Phe Asn lie Phe Arg lie Cys Gin Gin His Thr Lys Asp 
210 215 220 

lie Ser Glu Arg Gin Ala Arg Phe Ser Ser Gin Ser Gly Glu Thr Gly 
225 230 235 240 

Glu Val Gin Ala Cys Pro Asp Lys Arg Tyr Ala Met Val Leu Phe Arg 
245 250 255 

lie Thr Ser Val Phe Tyr lie Leu Trp Leu Pro Tyr lie lie Tyr Phe 
260 265 270 

Leu Leu Glu Ser Ser Thr Gly His Ser Asn Ar^ Phe Ala Ser Phe Leu 
275 280 285 

Thr Thr Trp Leu Ala lie Ser Asn Ser Phe Cys Asn Cys Val lie Tyr 
290 295 300 

Ser Leu Ser Asn Ser Val Phe Gin Arg Gly Leu Lys Arg Leu Ser Gly 
305 310 315 320 



Ala Met Cys Thr Ser Cys Ala Ser Gin Thr Thr Ala Asn Asp Pro Tyr 
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335 



Thr Val Arg Ser Lys Gly Pro Leu Asn Gly Cys His lie 



340 



345 



(17) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 
AGGAAGCTTT AAATTTCCT^ GCCATGAATG 30 

(18) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 31 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECtJLE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:17: 
ACCGAATTCA GATTACATTT GATTTACTAT G 31 

(19) INFORMATION FOR SEQ ID NO: 18: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1086 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 

ATGAATGAAT CCAGGTGGAC TGAATGGAGG ATCCTGAACA TGAGCAGTGG CATTGTGAAT 6 0 

GTGTCCGAGC GTCACTCCTG CCCACTTGGA TTTGGCCACT ACAGTGTGGT GGATGTCTGC 12 0 

ATCTTCGAGA CAGTGGTTAT TGTGTTGCTG ACATTTCTGA TCATTGCTGG GAATCTAACA 180 

GTTATCTTTG TCTTTCATTG TGCTCCACTG TTACATCATT ATACTACCAG CTATTTCATT 240 

CAGACGATGG CATATGCTGA TCTTTTCGTT GGAGTTAGCT GCTTGGTTCC TACTCTGTCA 300 

CTTCTCCACT ACTCCACAGG TGTCCACGAG TCATTGACTT GCCAGGTTTT TGGATATATC 360 

ATCTCAGTTC TAAAAAGTGT TTCTATGGCA TGTCTTGCTT GCATCAGTGT GGATCGTTAT 420 

CTTGCAATAA CCAAGCCTCT TTCCTACAAT CAACTGGTCA CCCCTTGTCG CTTGAGAATT 480 

TGCATTATTT TGATCTGGAT CTACTCCTGC CTAATTTTCT TGCCTTCCTT TTTTGGCTGG 540 

GGGAAACCTG GTTACCATGG TGACATTTTT GAATGGTGTG CCACGTCTTG GCTCACCAGT 600 

GCCTATTTTA CTGGCTTTAT TGTTTGTTTA CTTTATGCTC CTGCTGCCTT TGTTGTCTGC 660 

TTCACTTACT TCCACATTTT CAAAATTTGC CGTCAGCACA CCAAAGAGAT AAATGACCGA 720 

AGAGCCCGAT TCCCTAGTCA TGAGGTAGAT TCTTCCAGAG AGACTGGACA CAGCCCTGAC 780 
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CGTCGCTACG CCATGGTTTT GTTTAGGATA ACCAGTGTAT TTTATATGCT GTGGCTCCCC 840 

TATATAATTT ACTTTCTTCT AGAAAGCTCC CGGGTCTTGG ACAATCCAAC TCTGTCCTTC 900 

TTAACAACCT GGCTTGCAAT AAGTAATAGT TTTTGTAACT GTGTAATATA CAGCCTCTCC 960 

AACAGCGTTT TCCGGCTAGG CCTCCGAAGA CTGTCTGAGA CAATGTGCAC ATCCTGTATG 102 0 

TGTGTGAAGG ATCAGGAAGC ACAAGAACCC AAACCTAGGA AACGGGCTAA TTCTTGCTCC 1080 

ATTTGA 1086 
(20) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 361 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 

Met Asn Glu Ser Arg Trp Thr Glu Trp Arg lie Leu Asn Met Ser Ser 
15 10 15 

Gly lie Val Asn Val Ser Glu Arg His Ser Cys Pro Leu Gly Phe Gly 



20 



25 



30 



His 



Tyr Ser Val Val Asp Val Cys lie Phe Glu Thr Val Val lie Val 



35 



40 



45 



Leu 



Leu Thr Phe Leu lie lie Ala Gly Asn Leu Thr Val lie Phe Val 



50 



55 



60 
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Phe His Cys Ala Pro Leu Leu His His Tyr Thr Thr Ser Tyr Phe lie 
65 70 75 80 

Gin Thr Met Ala Tyr Ala Asp Leu Phe Val Gly Val Ser Cys Leu Val 
85 90 95 



Pro Thr Leu Ser Leu Leu His Tyr Ser Thr Gly Val His Glu Ser Leu 
100 105 110 

Thr cys Gin Val Phe Gly Tyr lie lie Ser Val Leu Lys Ser Val Ser 
115 120 125 

Met Ala cys Leu Ala Cys He Ser Val Asp Arg Tyr Leu Ala He Thr 
130 135 140 



Lys Pro Leu Ser Tyr Asn 
145 



Gin Leu Val Thr Pro Cys Arg Leu Arg He 
150 155 160 



cys He He Leu He Trp He Tyr Ser Cys Leu He Phe Leu Pro Ser 
165 170 175 

Phe Phe Gly Trp Gly Lys Pro Gly Tyr His Gly Asp He Phe Glu Trp 
180 185 190 

cys Ala Thr Ser Trp Leu Thr Ser Ala Tyr Phe Thr Gly Phe He Val 
195 200 205 

cys Leu Leu Tyr Ala Pro Ala Ala Phe Val Val Cys Phe Thr Tyr Phe 
210 215 220 

His He Phe Lys He Cys Arg Gin His Thr Lys Glu He Asn Asp Arg 
225 230 235 240 

Arg Ala Arg Phe Pro Ser His Glu Val Asp Ser Ser Arg Glu Thr Gly 
245 250 255 
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His Ser Pro Asp Arg Arg Tyr Ala Met Val Leu Phe Arg lie Thr Ser 
260 265 270 

Val Phe Tyr Met Leu Trp Leu Pro Tyr lie lie Tyr Phe Leu Leu Glu 
275 280 285 

Ser Ser Arg Val Leu Asp Asn Pro Thr Leu Ser Phe Leu Thr Thr Trp 
290 295 300 

Leu Ala He Ser Asn Ser Phe Cys Asn Cys Val He Tyr Ser Leu Ser 
305 310 315 320 

Asn Ser Val Phe Arg Leu Gly Leu Arg Arg Leu Ser Glu Thr Met Cys 
325 330 335 

Thr Ser Cys Met Cys Val Lys Asp Gin Glu Ala Gin Glu Pro Lys Pro 
340 345 350 

Arg Lys Arg Ala Asn Ser Cys Ser He 
355 360 

(21) INFORMATION FOR SEQ ID NO: 20: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 32 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECU1.E TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20: 
AGCGAATTCT GCCCACCCCA CGCCGAGGTG CT 32 



(22) INFORMATION FOR SEQ ID NO: 21: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

( C ) STRANDEDNESS : s ingle 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQX;ENCE DESCRIPTION: SEQ ID NO:21: 
TGCGGATCCG CCAGCTCTTG AGCCTGCACA 3 
(23) INFORMATION FOR SEQ ID NO: 22: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1381 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22: 

GGCCTTATCT TTCCAGTCGT CCAGCATGCT CTGCCCACCC CACGCCGAGG TGCACTGACC 60 

ATGAGCCTCA ACTCCTCCCT CAGCTGCAGG AAGGAGCTGA GTAATCTCAC TGAGGAGGAG 120 

GGTGGCGAAG GGGGCGTCAT CATCACCCAG TTCATCGCCA TCATTGTCAT CACCATTTTT 180 

GTCTGCCTGG GAAACCTGGT CATCGTGGTC ACCTTGTACA AGAAGTCCTA CCTCCTCACC 24 0 

CTCAGCAACA AGTTCGTCTT CAGCCTGACT CTGTCCAACT TCCTGCTGTC CGTGTTGGTG 300 



CTGCCTTTTG TGGTGACGAG CTCCATCCGC AGGGAATGGA TCTTTGGTGT AGTGTGGTGC 360 
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AACTTCTCTG CCCTCCTCTA CCTGCTGATC AGCTCTGCCA GCATGCTAAC CCTCGGGGTC 420 

ATTGCCATCG ACCGCTACTA TGCTGTCCTG TACCCCATGG TGTACCCCAT GAAGATCACA 480 

GGGAACCGGG CTGTGATGGC ACTTGTCTAC ATCTGGCTTC ACTCGCTCAT CGGCTGCCTG 540 

CCACCCCTGT TTGGTTGGTC ATCCGTGGAG TTTGACGAGT TCAAATGGAT GTGTGTGGCT 600 

GCTTGGCACC GGGAGCCTGG CTACACGGCC TTCTGGCAGA TCTGGTGTGC CCTCTTCCCC 660 

TTTCTGGTCA TGCTGGTGTG CTATGGCTTC ATCTTCCGCG TGGCCAGGGT CAAGGCACGC 720 

AAGGTGCACT GTGGCACAGT CGTCATCGTG GAGGAGGATG CTCAGAGGAC CGGGAGGAAG 780 

AACTCCAGCA CCTCCACCTC CTCTTCAGGC AGCAGGAGGA ATGCCTTTCA GGGTGTGGTC 840 

TACTCGGCCA ACCAGTGCAA AGCCCTCATC ACCATCCTGG TGGTCCTCGG TGCCTTCATG 900 

GTCACCTGGG GCCCCTACAT GGTTGTCATC GCCTCTGAGG CCCTCTGGGG GAAAAGCTCC 960 

GTCTCCCCGA GCCTGGAGAC TTGGGCCACA TGGCTGTCCT TTGCCAGCGC TGTCTGCCAC 1020 

CCCCTGATCT ATGGACTCTG GAACAAGACA GTTCGCAAAG AACTACTGGG CATGTGCTTT 1080 

GGGGACCGGT ATTATCGGGA ACCATTTGTG CAACGACAGA GGACTTCCAG GCTCTTCAGC 1140 

ATTTCCAACA GGATCACAGA CCTGGGCCTG TCCCCACACC TCACTGCGCT CATGGCAGGT 1200 

GGACAGCCCC TGGGGCACAG CAGCAGCACG GGGGACACTG GCTTCAGCTG CTCCCAGGAC 1260 

TCAGGTAACC TGCGTGCTTT ATAAGCCTCT CACCTGTCGC GTTTTCCCTG TGTTGCGTTT 1320 

CCCCCGTGTC GCGTTTCCCC TGTGCAGGCT CAAGAGCTGG CGGAGGGGCA TTTCCCACGG 1380 

TG 1382 
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(24) INFORMATION FOR SEQ ID NO: 23: 

(i) SEQUENCE CHTUIACTERISTICS : 

(A) LENGTH: 407 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: not relevant 

(ii) MOLECXJLE TYPE: protein (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:23: 
MSLNSSLSCR KELSNLTEEE GGEGGVIITQ FIAIIVITIF VCLGNLVIW 
TLYKKSYLLT LSNKFVFSLT LSNFLLSVLV LPFWTSSIR REWIFGWWC 



NFSALLYLLI SSASMLTLGV lAIDRYYAVL YPMVYPMKIT GNRAVMALVY 



IWLHSLIGCL PPLFGWSSVE FDEFKWMCVA AWHREPGYTA FWQIWCALFP 



FLVMLVCYGF IFRVARVKAR KVHCGTWIV EEDAQRTGRK NSSTSTSSSG 



SRRNAFQGW YSANQCKALI TILWLGAFM VTWGPYMWI ASEALWGKSS 



VSPSIiETWAT WLSFASAVCH PLIYGLWNKT VRKELLGMCF GDRYYREPFV 



QRQRTSRLFS ISNRITDLGL SPHLTAIJ^G GQPLGHSSST GDTGFSCSQD 



SGNLRAL 



(25) INFORMATION FOR SEQ ID NO: 24: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 
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(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQXJENCE DESCRIPTION: SEQ ID NO: 24: 
GGAAGCTTCA GGCCCAAAGA TGGGGAACAT 30 
(26) INFORMATION FOR SEQ ID NO: 25: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:25: 
GTGGATCCAC CCGCGGAGGA CCCAGGCTAG 30 



(27) INFORMATION FOR SEQ ID N0:26: 

(i) SEQXJENCE CHARACTERISTICS: 

(A) LENGTH: 1697 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO;26: 
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ACTCCCAAAG TGCTGGGCTT ACAGGTGT/VA GCCATCATGT CCAGCCGTTC AGATATTCTA 60 

GTTGAATTGG AGTTGGTGGG CTAGTACACC TTCTAAATTA AATGAGTAAA GGATTTAGAA 12 0 

TGGTGCCTGA CACACAGTAG GTGCTACATT CATGTTAGCT ACTATTATAA ACCTTTCCTG 180 

CCTCTGACTT TCAGGGTCTT GCCCACCACC AGCGATGCCC AGCCCTTGGT AGAGCTTGAA 240 

CCACCTTCTA TAAACAGGAT GGCGGTGGAG AGACAGGCCC AGTCCCTGAG CCCATGAGGA 300 

GTGTGGCCCC TTCAGGCCCA AAGATGGGGA ACATCACTGC AGACAACTCC TCGATGAGCT 36 0 

GTACCATCGA CCATACCATC CACCAGACGC TGGCCCCGGT GGTCTATGTT ACCGTGCTGG 420 

TGGTGGGCTT CCCGGCCAAC TGCCTGTCCC TCTACTTCGG CTACCTGCAG ATCAAGGCCC 480 

GGAACGAGCT GGGCGTGTAC CTGTGCAACC TGACGGTGGC CGACCTCTTC TACATCTGCT 540 

CGCTGCCCTT CTGGCTGCAG TACGTGCTGC AGCACGACAA CTGGTCTCAC GGCGACCTGT 600 

CCTGCCAGGT GTGCGGCATC CTCCTGTACG AGAACATCTA CATCAGCGTG GGCTTCCTCT 660 

GCTGCATCTC CGTGGACCGC TACCTGGCTG TGGCCCATCC CTTCCGCTTC CACCAGTTCC 720 

GGACCCTGAA GGCGGCCGTC GGCGTCAGCG TGGTCATCTG GGCCAAGGAG CTGCTGACCA 780 

GCATCTACTT CCTGATGCAC GAGGAGGTCA TCGAGGACGA GAACCAGCAC CGCGTGTGCT 840 

TTGAGCACTA CCCCATCCAG GCATGGCAGC GCGCCATCAA CTACTACCGC TTCCTGGTGG 900 

GCTTCCTCTT CCCCATCTGC CTGCTGCTGG CGTCCTACCA GGGCATCCTG CGCGCCGTGC 960 

GCCGGAGCCA CGGCACCCAG AAGAGCCGCA AGGACCAGAT CCAGCGGCTG GTGCTCAGCA 1020 

CCGTGGTCAT CTTCCTGGCC TGCTTCCTGC CCTACCACGT GTTGCTGCTG GTGCGCAGCG 1080 

TCTGGGAGGC CAGCTGCGAC TTCGCCAAGG GCGTTTTCAA CGCCTACCAC TTCTCCCTCC 1140 
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TGCTCACCAG CTTCAACTGC GTCGCCGACC CCGTGCTCTA CTGCTTCGTC AGCGAGACCA 1200 

CCCACCGGGA CCTGGCCCGC CTCCGCGGGG CCTGCCTGGC CTTCCTCACC TGCTCCAGGA 1260 

CCGGCCGGGC CAGGGAGGCC TACCCGCTGG GTGCCCCCGA GGCCTCCGGG AAAAGCGGGG 1320 

CCCAGGGTGA GGAGCCCGAG CTGTTGACCA AGCTCCACCC GGCCTTCCAG ACCCCTAACT 1380 

CGCCAGGGTC GGGCGGGTTC CCCACGGGCA GGTTGGCCTA GCCTGGGTCC TCCGCGGGTG 144 0 

GCTCCACGTG AGGCCTGAGC CTTCAGCCCA CGGGCCTCAG GGCCTGCCGC CTCCTGCTTC 1500 

CCTCGCTGCG GAGGCAGGGA AGCCCCTGTA ACTCCGGAAG CCTGCTCTCG CTTGCTGAGC 1560 

CCGCTGGGAC CGCCGAGGGT GGGAATAAGC CCCGGTTGGC TCGTGGGAAT AAGCCGTGTC 162 0 

CTCTGCCGCG GCTGCGATGT GGCCACGCTG GGGCTGCTGG TCGGGGGAAA ACAGTGAACT 1680 

GCGTCCCCTG GCCTGCT 1697 

(28) INFORMATION FOR SEQ ID NO: 27: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 365 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOIOGY: not relevant 

(ii) MOIjECXJIjE TYPE: protein (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27: 

MGNITADNSS MSCTIDHTIH QTLAP\A^YVT VLWGFPANC LSLYFGYLQI 
KARNEIiGVYIi CNLTVADLFY ICSLPFWLQY VLQHDNWSHG DLSCQVCGIL 



RN5V)OCID: «WO 0006597A3 tA> 



wo 00/06597 



23 



PCT/US99/17425 



LYENIYISVG FLCCISVDRY LAVAHPFRFH QFRTLKAAVG VSWIWAKEL 

LTSIYFIiMHE EVIEDENQHR VCFEHYPIQA WQRAINYYRF LVGFLFPICL 

LLASYQGILR AVRRSHGTQK SRKDQIQRLV LSTWIFLAC FLPYHVLLLV 

RSVWEASCDF AKGVFNAYHF SLLLTSFNCV ADPVLYCFVS ETTHRDLARL 

RGACLAFLTC SRTGRAREAY PL.GAPEASGK SGAQGEEPEL LTKLHPAFQT 
PNSPGSGGFP TGRIiA 

(29) INFORMATION FOR SEQ ID NO: 28: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 2 0 base pairs 
<B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 28: 
CTGGTCCTGC ACTTTGCTGC 20 

(30) INTORMATION FOR SEQ ID NO:29: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 25: 
AGCATCACAT AGGTCCGTGT CAC 23 

(31) INFORMATION FOR SEQ ID NO: 30: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:30: 
ACCAGAAAGG GTGTGGGTAC ACTG 24 

(32) INFORMATION FOR SEQ ID NO: 31: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31: 
GGAACGAAAG GGCACTTTGG 20 

(33) INFORMATION FOR SEQ ID NO: 32: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 
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(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOIiECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 32 
GCTGCCTCGG GATTATTTAG 

(34) INFORMATION FOR SEQ ID NO: 33: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 33 
GCCTATTAGC AGGAACATGGGTG 

(35) INFORMATION FOR SEQ ID NO:34: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 34 



GCTAGCGTTC ATCGCCGC 
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(36) INFORMATION FOR SEQ ID NO: 35: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH; 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(xi> SEQUENCE DESCRIPTION: SEQ ID NO: 35: 
CTGGACTGTA TCGCCCCG 18 



(37) INFORMATION FOR SEQ ID NO:36: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 32 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 36: 
GATCTCTAGA ATGATGTGGG GTGCAGGCAG CC 32 

(38) INFORMATION FOR SEQ ID NO: 37: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 35 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 37 
CTAGGGTACC CGGACATCAC TGGGGGAGCG GGATC 

(39) INFORMATION FOR SEQ ID NO: 38: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 31 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 38 

GATCTCTAGA ATGCAGGGTG CAAATCCGGC C 

(40) INFORMATION FOR SEQ ID NO: 39: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 35 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 39 
CTAGGGTACC CGGACCTCGC TGGGAGACCT GGAAC 

(41) INFORMATION FOR SEQ ID NO: 40: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 40: 

ATGTGGAACG CGACGCCCAG CG 22 

(42) INFORMATION FOR SEQ ID NO: 41: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 42 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

( D ) TOPOIiOGY : 1 inear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:41: 
TCATGTATTA ATACTAGATT CT 42 

(43) INFORMATION FOR SEQ ID NO: 42: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 38 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOXiECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:42: 
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TACCATGTGG AACGCGACGC CCAGCGAAGA GCCGGGGT 38 

(44) INFORMATION FOR SEQ ID NO:43: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 39 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:43: 
CGGAATTCAT GTATTAATAC TAGATTCTGT CCAGGCCCG 3 9 

(45) INFORMATION FOR SEQ ID NO: 44: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1101 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
{ D ) TOPOLOGY : 1 inear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 44: 
ATGTGGAACG CGACGCCCAG CGAAGAGCCG GGGTTCAACC TCACACTGGC CGACCTGGAC 60 
TGGGATGCTT CCCCCGGCAA CGACTCGCTG GGCGACGAGC TGCT6CAGCT CTTCCCCGCG 120 
CCGCTGCTGG CGGGCGTCAC AGCCACCTGC GTGGCACTCT TCGTGGTGGG TATCGCTGGC 180 



AACCTGCTCA CCATGCTGGT GGTGTCGCGC TTCCGCGAGC TGCGCACCAC CACCAACCTC 240 
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TACCTGTCCA GCATGGCCTT CTCCGATCTG CTCATCTTCC TCTGCATGCC CCTGGACCTC 3 00 

GTTCGCCTCT GGCAGTACCG GCCCTGGAAC TTCGGCGACC TCCTCTGCAA ACTCTTCCAA 360 

TTCGTCAGTG AGAGCTGCAC CTACGCCACG GTGCTCACCA TCACAGCGCT GAGCGTCGAG 420 

CGCTACTTCG CCATCTGCTT CCCACTCCGG GCCAAGGTGG TGGTCACCAA GGGGCGGGTG 480 

AAGCTGGTCA TCTTCGTCAT CTGGGCCGTG GCCTTCTGCA GCGCCGGGCC CATCTTCGTG 540 

CTAGTCGGGG TGGAGCACGA GAACGGC7VCC GACCCTTGGG ACACCAACGA GTGCCGCCCC 600 

ACCGAGTTTG CGGTGCGCTC TGGACTGCTC ACGGTCATGG TGTGGGTGTC CAGCATCTTC 6 60 

TTCTTCCTTC CTGTCTTCTG TCTCACGGTC CTCTACAGTC TCATCGGCAG GAAGCTGTGG 720 

CGGAGGAGGC GCGGCGATGC TGTCGTGGGT GCCTCGCTCA GGGACCAGAA CCACAAGCAA 780 

ACCGTGAAAA TGCTGGCTGT AGTGGTGTTT GCCTTCATCC TCTGCTGGCT CCCCTTCCAC 840 

GTAGGGCGAT ATTTATTTTC CAAATCCTTT GAGCCTGGCT CCTTGGAGAT TGCTCAGATC 900 

AGCCAGTACT GCAACCTCGT GTCCTTTGTC CTCTTCTACC TCAGTGCTGC CATCAACCCC 960 

ATTCTGTACA ACATCATGTC CAAGAAGTAC CX3GGTGGCAG TGTTCAGACT TCTGGGATTC 1020 

GAACCCTTCT CCCAGAGAAA 6CTCTCCACT CTGAAAGATG AAAGTTCTCG GGCCTGGACA 1080 

GAATCTAGTA TTAATACATG A 1101 
(46) INFORMATION FOR SEQ ID NO: 45: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 366 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOIjOGY: not relevsint 
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(ii) MOLECULE TYPE: protein (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:45: 
MWNATPSEEP GFNLTLADLD WDASPGNDSL GDELLQLFPA PLLAGVTATC VALFWGIAG 
NLLTMLWSR FRELRTTTNL YLSSMAFSDL LIFLCMPLDL VRLWQYRPWN FGDLLCKLFQ 
FVSESCTYAT VLTITALSVE RYFAICFPLR AKVWTKGRV KLVIFVIWAV AFCSAGPIFV 
LVGVEHENGT DPWDTNECRP TEFAVRSGLL TVMVWVSSIF FFLPVFCLTV LYSLIGRKLW 
RRRRGDAWG ASLRDQNHKQ TVKMLAVWF AFILCWLPFH VGRYLFSKSF EPGSLEIAQI 
SQYCNLVSFV LFYLSAAINP ILYNIMSKKY RVAVFRLLGF EPFSQRKLST LKDESSRAWT 
ESSINT 
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